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Foreword 



The Workshop on Randomization and Approximation Techniques in Computer 
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submissions by the program committee and a number of other referees. Extensive 
feedback was provided to authors as a result, which we hope has proven helpful 
to them. 
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Disjoint Paths in Expander Graphs via Random 
Walks: a Short Survey 



Alan M. Frieze* 

Department of Mathematical Sciences 
Camegie-Mellon University 
Pittsburgh 
PA 15213 
USA 



Abstract 

There has been a significant amount of research lately on solving the edge 
disjoint path and related problems on expander graphs. We review the random 
walk approach of Broder, Frieze and Upfal. 



1 Introduction 

The basic problem discussed in this paper can be described as follows: we are given 
a graph G = (V, E) and a set of K disjoint pairs of vertices in V . If possible, find 
edge disjoint paths Pj that join a* to hi for i = 1,2,..., K. We call this the Edge 
Disjoint Paths problem. We also say that G is K -routable if such paths exist for any 
set of K pairs. For arbitrary graphs, deciding whether such paths exist is in V for 
fixed K - Robertson and Seymour [16], but is A^'P-complete if K is part of the in- 
put, being one of Karp’s original problems. This negative result can be circumvented 
for certain classes of graphs, see Frank [7]. In this paper we will focus on expander 
graphs. There have been essentially two bases for approaches to this problem in this 
context: (i) random walks and (ii) multicommodity flows. Our aim here is to provide a 
summary of the results known to us at present together with an outline of some of their 
proofs. We emphasise the random walk approach, see [11, 12, 13] for more detail on 
the multicommodity flow approach. 

Expander Graphs For certain bounded degree expander graphs, Peleg and Upfal 
[15] showed that if G is a sufficiently strong expander then G is -routable for some 
small constant e <C 1/3 that depends only on the expansion properties of the input 
graph. Furthermore there is a polynomial time algorithm for constructing such paths. 

This result has now been substantially improved and there is only a small factor 
(essentially log n) between upper and lower bounds for maximum routability. 

*Supported in part by NSF grant CCR9530974. E-mail: alan@random.math.cmu.edu. 

M. Luby, J. Rohm, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 1-14, 1998. 
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Using random walks, Broder, Frieze and Upfal [2] improved the result of [15] to 
obtain the same result for K = n/(logn)'^ where k depends only on the expansion 
properties of the graph. More recently, they [3] improved this by replacing k by 2 + e 
for any positive constant e > 0, at the expense of requesting greater expansion prop- 
erties of G. More recently still, Leighton, Rao and Srinivasan [13], using the rival 
multi-commodity flow technology have improved on this by showing that the e can be 
replaced by o(l). In Section 4 we will show how the random walks approach can be 
improved to give the same result. Theorem 3. It is rather interesting that these, in many 
ways quite different, approaches seem to yield roughly the same results. We note that 
both approaches yield a non-constructive proof [3], [12] (via the local lemma) that in a 
sufficiently strong expander K = $7(n/(logn)^) is achievable. 

Random Graphs Random graphs are well known to be excellent expanders and so 
it is perhaps not surprising that they very highly “routable”. Broder, Frieze, Suen and 
Upfal [4] and Frieze and Zhao [9] (see Theorems 7,8) show that they are iT-routable 
where K is within a constant factor of a simple lower bound, something that has not 
yet been achieved for arbitrary expander graphs. 

Low Congestion Path Sets One way of generalising the problem is to bound the 
number of paths that use any one edge, the edge congestion, by some value g in place 
of one. Bounds on the number of routable pairs in this case are given in Theorem 5. 

Dynamic problem In the dynamic version of the problem each vertex receives an 
inflnite stream of requests for paths starting at that vertex. The times between requests 
are random and paths are are only required for a certain time (until the communication 
terminates) and then the path is deleted. Again each edge in the network should not be 
used by more than g paths at once. 

The random walk approach gives a simple and fully distributed solution for this 
problem. In [3] (see Theorem 6) we show that if the injection to the network and 
the duration of connections are both controlled by Poisson processes then there is an 
algorithm which achieves a steady state utilization of the network which is similar to 
the utilization achieved in the static case situation. Theorem 5. 

Approximation Algorithm So far we have only considered the case where all 
requests for paths have to be fllled. If this is not possible then one might be satisfied 
with filling as many requests as possible. Kleinberg and Rubinfeld [10] (see Theorem 
10) prove that a certain greedy strategy provides has a worst-case performance ratio of 
order 1 / (log n log log n) . 

Vertex Disjoint Paths Finally, there is the problem of finding vertex disjoint paths 
between a given set of pairs of vertices. In the worst-case one cannot do better than the 
minimum degree of the graph. The interest therefore must be on graphs with degrees 
which grow with the size of the graph. In this context random graphs [5] have optimal 
routing properties, to within a constant factor. 

The structure of the paper is now as follows; Section 3 discusses the problem of 
splitting an expander, a basic requirement for finding edge disjoint paths. Section 4 
details the aforementioned results on expander graphs and outlines some of the proofs 
for the random walk approach. Section 5 details the results on random graphs and 
outlines the corresponding proofs. Section 6 describes the result of Kleinberg and 
Rubinfeld. A final section provides some open problems. 
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2 Preliminaries 



There are various ways to define expander graphs; here we define them in terms of 
edge expansion (a weaker property than vertex expansion). 

For a set of vertices S' C let out(S) be the number of edges with one end-point 
in S and one end-point in y \ S, that is 

out(S) = ||{u, t;} I {u, f} E E,u G S,v ^ >S'|| • 

Similarly, 

in(S) = | {u,v} G E,w,t; 6 s|| . 

Definition 1 A praph G = (V,E) is a B-expander, if for every set S C V, |S| < 
|y|/2, w We out(S) > /3|S|. 

For certain results we need expanders that have the property that the expansion of small 
sets is not too small. The form of definition given below is taken from [3]. 



Definition 2 An r-regular graph G = (V, E) is called an (a, p, ^)-expander if for 
every set S Cl V 




(1 -a)r|S| 

p\s\ 



if\S\<l\V\ 
ifl\V\ < |S| < \V\/2 



In particular random regular graphs and the (explicitly constructible) Ramanujan 
graphs of of Lubotsky, Phillips and Sarnak [14] are (a, /3, 7 ) -expanders with a = 
0(7 + p close to r/4. 

A random walk on an undirected graph G — (V, E) is a Markov chain {Xt} C V 
associated with a particle that moves from vertex to vertex according to the following 
rule: the probability of a transition from vertex i, of degree di, to vertex j is 1/di if 
{i,j} G E, and 0 otherwise. (In case of a bi-partite graph we need to assume that we do 
nothing with probability 1/2 and move off with probability 1/2 only. This technicality 
is ignored for the remainder of the paper.) Its stationary distribution, denoted tt, is 
given by Tr{v) = dy/ (2ISI). Obviously, for regular graphs, the stationary distribution 
is uniform. 

A trajectory II'^ of length r is a sequence of vertices [wo,Wi, . . . , w,-] such that 
{wf, Wtyi } E E. The Markov chain {Xt} induces a probability distribution on trajec- 
tories, namely the product of the probabilities of the transitions that define the trajec- 
tory. 

Let P denote the transition probability matrix of the random walk on G, and let 
Py% denote the probability that the walk is at w at step t given that it started at v. Let 
A be the second largest eigenvalue of P. (All eigenvalues of P are real.) It is known 
that 



\Pi% - < xW^i'w)/Av)- ( 1 ) 

In particular, for regular graphs 

P^l = l+0{\^). 



(2) 
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To ensure rapid convergence we need A < 1 — e for some constant e > 0. This holds 
for all expanders (Alon [1]). In particular if G is a /3-expander with maximum degree 
A respectively then Jerrum and Sinclair [17] show that 







(3) 



It is often useful to consider the separation s of the distribution Py^} from the limit 
distribution tt given by 



s{t) = max 

V,W 



7T(w) - Py% 
7t{w) 



(4) 



Then we can write 

Pj)^} = (1 - + s{t)a 

where cr is a probability distribution. We can then imagine that the distribution Py^} 
is producing by choosing either a with probability s{t) or tt with probability 1 — s(f). 
Hence if £ is an event that depends only on the state of the Markov chain we have 



(1 — s(f))Pr(£^ under tt) + s{t) > Pr(£^ under p},*}) > (1 — s(f))Pr(£’ under tt). 

(5) 



We use this in the following scenario; 

Experiment A: Choose Ui E. V with distribution tt and do a random walk W± of 
length r from Ui- Let Vi be the terminal vertex of Wi- 

Experiment B: Choose U 2 and V 2 independently from V with distribution tt and do a 
random walk of length r from 112 to 1 ) 2 - 

We claim that for any event £ depending on walks of length r, 

|Pr((«i,t;i,ITi) G £) - Pr((« 2 , 'U 2 , IT 2 ) G £)\ < s(t). (6) 

This follows from the stronger claim that for any u eV and any event £ depending on 
walks of length r 



|Pr((-«i,Ui,H'"i) e£ \ ui=u) -Pr{{u 2 ,V 2 ,W 2 ) e£\u 2 =u)\< s(r), 
which follows from (5). 

The notation B(m,p) stands for the binomial random variable with parameters 
m = number of trials, and p = probability of success. 

3 Splitting an Expander 

Most of the algorithms we describe work in phases. Each phase generates paths and it is 
important that the sets of paths produced in each phase remain edge disjoint. One way 
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of ensuring this is to insist that different phases work on different expander graphs. If 
the input consists of a single expander then we need a procedure for partitioning E into 
p sets, say Ei,E 2 ,. ■ ■ , Ep, where the graphs Gi = {V, Ei) are themselves expanders. 

A natural way of trying to split G into expander graphs is to randomly partition E 
into p sets. The problem with this is that in a bounded degree expander this will almost 
surely lead to subgraphs with isolated vertices. We must find a partition which provides 
a high minimum degree in both graphs. The solution in [2, 3] is 

Algorithm Partition(G, p) 

1. Orient the edges of G so that \outdegree{v) — indegree{v)\ < 1 for all v gV. 

2. For each v G V randomly partition the edges directed out of v into p sets 

each of size [r/2pj or rr/2p]. Let Ei = 

1 < t < p. 

Define H{'^) = ((1 — and ip{e) = (1 — e) ln(l — e) + e. 

Theorem 1 Let G = (V,E) be an r-regular n-vertex graph that is an (a,l3,^)- 
expander. If e G (0,1) and p < r/2 are such that (3 > p( 7 t/>(e))“^ ln(iJ( 7 )), 
then Partition splits the edge-set of G into p subgraphs. With probability at least ^ 
1 — exp{—n{l3j'(p{e)/p — ln(iL( 7 )) — o(l))), all the p subgraphs span V and have 
edge-expansion at least 

• [r/(2p)J — arfor sets of size at most jn. 

• (1 — e)(5/pfor sets of size betweem jn and n/2. 

In particular each Gi is a (^-expander where 

c = min{ [r/{2p)\ - ar, (1 - e)/3/p}. (7) 



This does not seem to be the best way to proceed, but it is the best we know construc- 
tively. Frieze and Molloy [8] have a stronger result which is close to optimal, but at 
present it is non-constructive. Let the edge-expansion rj{G) of G be defined by 



ViG) = 



mm 

scv 

\S\<n/2 



out(S') 



Theorem 2 Let p > 2 be a positive integer and let e > Q be a small positive real 
number. Suppose that 



logr 

ri{G) 



> 7pe“2 

> 4e“^plogr. 



Then there exists a partition E = EiVJ Ep such that for 1 < ? < p 



^The o(l) term tends to 0 as n ^ oo. 
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(a) 



riiGi) > 



P 



(b) 



(1 - e)^ 

P 



< S{Gi) < A{Gi) < 



(1 + e)r 
P 



4 Finding paths in Expander graphs 

4.1 Edge Disjoint paths 

We will first concentrate on showing how using random walks we can achieve the same 
bound on the number of routable pairs as given in [13]. 

Fix integer s > 1 and let denote the natural logarithm iterated s times e.g. 

log^^^ n = log log n. 

Let mi = log^*^n and m^+i = |"| for * > 1. Then let Ki = 

[2crn/(m*(logn)^)J for i > 1 and a < s + 2 be the largest i such that Ki > 0. 
Here c = 0{C^ /{r^s)) is a positive constant - ^ as in (7). 

Theorem 3 Suppose G is an r-regular n-vertex graph that is an {a, j3, ^)-expander. 
Suppose that C > 1 above when p = 5a. Then G is crn/( (log n)^ log^®^ n) -routable. 

Proof We first split G into p = 5a expander graphs Gi , G 2 , ■ ■ ■ , Gp using al- 
gorithm Partition. Note that the minimum degree in each G* is at least r/(2p) and 
maximum degree is at most r /2. 

Let Hi = G 5 (j_i)_|_i U • • • U Gsj for 1 < z < cj. The algorithm runs in phases. 
Phase i is left to deal with at most \ Ki\ pairs left over from Phases 1 to z — 1, assuming 
these phases have all succeeded. Thus Phase a should finish the job. We run Phase i on 
graph Hi and this keeps the paths edge disjoint. Denote the set of source-sink pairs for 
Phase i by 5 a,* = {ogi, • • • , ^B,i = {bi,i ■■■ , Phase i is divided 

into 4 subphases. 

Subphase La: The aim here is to choose Wj,Wj, 1 < J < 2Ki such that (i) 
Wj 6 Wj, (ii) fWj I = m* -F 1, (iii) the sets Wj , 1 < j < 2Ki are pairwise disjoint and 
(iv) Wj induces a connected subgraph of Fj = G 5 (j_i)+ 2 - 

As in [11] we can partition an arbitrary spanning tree T of Fj. Since T has maxi- 
mum degree at most r we can find 2Ki vertex disjoint subtrees Tj , 1 < j < 2Ki of T, 
each containing between rrii+l and (r — Ijm* -F 2 vertices. We can find Ti as follows: 
choose an arbitrary root p and let Qi,Q 2 , . . . , Qa be the subtrees of p. If there exists 
I such that Qi has between m* -F 1 and (r — l)m* -F 2 vertices then we take Ti = Qi. 
Otherwise we can search for Ti in any Qi with more than (r — l)m* -F 2 vertices. Since 
T\Ti is connected, we can choose all of the Tj’s in this way. Finally, Wj is the vertex 
set of an arbitrary m* -F 1 vertex subtree of Tj and wj is an arbitrary member of TL^ for 
J = 1, 2, . . . , 2A j. 

Subphase Lb: Using a network flow algorithm in G 5 (j_i)+i connect in an arbitrary 
manner the vertices of S'a.iUSb,* to Wi = {wi , . . . , W 2 Ki } by 2Ki edge disjoint paths 
as follows: 
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• Assume that every edge in G 5 (j_i)_|_i has a capacity equal to 1. 

• View each vertex in Si as a source with capacity 1 and similarly every vertex in 
Wi as a sink with capacity equal 1 . 

The expansion properties of G 4 (j_i)+i ensure that such flows always exist. 

Let ak,i (resp. hk,i) denote the vertex in Wi that was connected to the original end- 
point Gk,i (resp. bk,i)- Our problem is now to find edge disjoint paths joining to 
bk,i for 1 < A: < iT*. 

Subphase i.c: If Wt has been renamed as d,k,i (resp. bk,i) then rename the elements 
of Wt as a,k,i,i, (resp. bk,i,i , ) 1 < ^ < m*- Choose 1 < j < rriiKi and r/j, 1 < j < 
niiKi independently at random from the steady state distribution vr* of a random walk 
on Gu- Using a network flow algorithm as in Subphase i.b, connect {ak,i,i : 1 < fc < 

1 < < mi} to : 1 < i < miKi} by edge disjoint paths in G^i- 2 - Similarly, 

connect G\. <k < Ki, 1 < i < m*} to {rjj : 1 < j < miKi} by edge disjoint 

paths in Gsj-i. Rename the other endpoint of the path starting at ak,i,i (resp. bkj,i) as 
ak,t,i (resp. bk,i,i). 

Once again the expansion properties of Gm- 2 , G^i-i ensure that flows exist. 

Subphase Ld: Choose Xk,i,i, 1 < A: < Ki, 1 < A < m* independently at ran- 
dom from the steady state distribution Ttj of a random walk on Gsj. Let Wj, ^ ^ (resp. 

W}! I ^ be a random walk of length dlogn from ak,i,i (resp. bk,t,i) to Here 

$ = r^/(2C^)) is chosen so that the separation (4) between vr* and the distribution of 
the terminal vertex of the walk is 0{n~^). ((3) gives A* < 1 — where A* is 

the second largest eigenvalue of a random walk on Gs*.) The use of this intermediate 
vertex Xk,i,i helps to break some conditioning caused by the pairing up of the flow 
algorithm. 

Let B} ^ (resp. denote the bundle of walks .,1 < £ < rrii (resp. 

Wj/f 1 < .£ < rrii). Following [13] we say that Wj, ^ ^ is bad if there exists k' ^ k 
such that Wj ^ ^ shares an edge with a walk in a bundle JB}, ^ or B'^, j. Each walk starts 
at an independently chosen vertex and moves to an independently chosen destination. 
The steady state of a random walk is uniform on edges and so at each stage of a walk, 
each edge is equally likely to be crossed. Thus 



is bad) < 



(log n)^cT 
rn 



< 1/10 



for sufficiently small c. 

We say that index k is bad if either Bj ^ or B'jj ^ contain more than m.j/3 bad walks. 

If index k is not bad then we can find a walk from ak,i,i to bk,t,i through Xk,t,i for some 
£ which is edge disjoint from all other walks. This gives a walk 



bk,i dk,i,i — >• Qk,i,i — >• Xk,i,i — >• bk,i,i bk,e,i bk,i bk,i, 



which is edge-disjoint from all other such walks. 

The probability that index k is bad is at most 

2Pr(B(mi,.l) > 1/3) < 2(3e/10)™”/i“. 
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So with probability at least 1/2 the number of bad indices is no more than 

< -ffj+i- By repetition we can ensure that Phase i succeeds whp. 
The theorem follows. □ 



4.1.1 Existence Result 

In this section we describe the use of the Lovasz Local Lemma [ 6 ] to prove the exis- 
tence of a large number of edge disjoint paths in any r-regular (a, /?, 7 )-expander, [3]. 
At the present time we do not see how to make the argument constructive. 

Theorem 4 Given a bounded degree (a, (3, j)-expander graph there exists a parame- 
ter c that depends on a, (3, 7 , but not on n, such that any set of less than cn/(logn)^ 
disjoint pairs of vertices can be connected by edge disjoint paths. 

The proof starts by splitting G into 2 /3' > 1 expanders and using the first to route 
,l)K to randomly chosen di , . . . ,t)K via edge disjoint paths found through a 
flow algorithm as in say, Subphase i.h of the algorithm of the previous section. 

Then, for 1 < / < IT, di is joined to bi via an O(logn) random walk W* through 
a randomly chosen intermediate vertex Xi- We use the local lemma to show that 
. . . , Wk are edge disjoint with positive probability. Ignoring several technical 
problems we consider bad event Si j = {W* fl Wj -f- 0} and argue that fi j depends 
only on the < 2K events of the form Si/j or £iji. Since PT{£ij) — 0((logn)^/n) 
we can follow through if iT(logn)^/n ^ 1. This gives the theorem. 



4.2 Low Congestion Paths 

We discuss the following result from [3]. 

Theorem 5 There is an explicit polynomial time algorithm that can connect any set of 
K = a(n)n/logn pairs of vertices on a bounded degree expander so that no edge is 
used by more than g paths where 



O s + 






log log n 
log(l/d) 

t 0 (s + a + loglogn), 



fora < 1 / 2 ; 
fora > 1 / 2 , 



d = min(a, 1 / log log n), and s is the maximal multiplicity of a vertex in the set of 
pairs. 

See [11] for similar results proved via multi-commodity flows. 

The algorithm uses the same flow/random walk paradigm that we have already seen 
twice above, ai, . . . ,bK are joined to randomly chosen di, . . . ^bx via edge disjoint 
paths found through a flow algorithm. Then, for 1 < / < iT, di is joined to bi via 
an O(logn) random walk Wi through a randomly chosen intermediate vertex Xi. The 
number of paths which use an edge is bounded by the sum of two binomials. We 
then see that for a sufficiently large k > 0 whp there are fewer than n/(logn)'^ edges 
which have congestion greater than g. We delete all of the paths through such edges 
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and re-join the corresponding pairs via edge disjoint paths, using the algorithm of [2]. 

□ 

Leighton, Rao and Srinvasan [ 12 ] generalise Theorem 4 by showing that for any 
B > 1 , given enough expansion one can join cn/jlogn)^"*"^/^ pairs with congestion 
at most B. 



4.3 Dynamic Allocation of Paths 

Broder, frieze and Upfal [ 3 ] discuus a stochastic model for studying a dynamic version 
of the circuit switching problem. In their model new requests for establishing paths 
arrive continuously at nodes according to a discrete Poisson process. Requests wait in 
the processor’s queue until the requested path is established. The duration of a path is 
exponentially distributed. 

Their model is characterized by three parameters: 

• Pi is an upper bound on the probability that a new request arrives at a given node 
at a given step. 

• P2 is the probability that a given existing path is terminated in a given step. A 
path lives from the time it is established until it is terminated. 

• ^ is the maximum congestion allowed on any edge. 

• The destinations of path requests are chosen uniformly at random among all the 
graph vertices 

They study a simple and fully distributed algorithm for this problem. In the algorithm 
each processor at each step becomes active with a probability P{ > Pi. An inactive 
processor does not try to establish a path even if there are requests in its queue. The 
algorithm can be succinctly described: Assume that a is active at step t, and the first 
request in a’s queue is for b. Processor a tries to establish a path to b by choosing a 
random trajectory of length r = Co log n connecting a to b. If the path does not use any 
edge with congestion greater than g — 1 , the path is established, otherwise the request 
stays in the queue. 



Theorem 6 Let 



$ = min 






1 rg ] 

\og{grn) ’ rls+L/a J ' 



There exists a constant 7 such that if Pi < 7$P2, then the system is stable and the 
expected wait of a request in the queue is 0 ( 1 /Pi). 



Before outlining the proof let us see the consequence of this theorem. Let E {N) — nP\ 
be the expected number of new requests that arrive at the system at a given step, and 
let E(P>) = I/P2 be the expected duration of a connection. For the system to be 
stable, the expected number of simultaneously active paths in the steady state must be 
at least E (A'^) E (P) = nPi /P2 . Plugging g = log log n / log a; for some lo in the range 
[1, logn] in the definition of $ we get 

1 \ 



$ = II 



wlogn 
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Thus the theorem above implies that for such a congestion g, the system remains stable 
even if we choose Pi and P 2 such that 

E(AT)e(£)) = = 7n# = 

in which case the dynamic algorithm utilizes the edges of the network almost as ef- 
ficiently as the static algorithm, Theorem 5 (there seems to be an efficiency gap of 
maximum order log log log n for u) < log log n). 

In the proof of the theorem, time is partitioned into intervals of length T < 1 / (4Pi ) . 
Let Hf denote the history of the system during the first t time intervals. Define the event 

_ , _ /if the queue of processor v was not empty at the beginning of interval 

^ i then v served at least one request during interval t 

The goal is to show that for all v and t. 



w log n J ’ 



Pr{S{v,t)\Ht-2)>l. ( 8 ) 

Given this we conclude that in any segment of 2T steps processor v is serving at least 
one request with probability at least 1/2. The number of new arrivals in this time 
interval has a Binomial distribution with expectation at most 2TP\ < 1/2. Thus, 
under these conditions the queue is dominated by an M/M/1 queue with expected 
inter-arrival distribution greater than 4T, and expected service time smaller than 4T. 
The queue is stable, and the expected wait in the queue is 0(1/T) = 0(1/Pi). 

To prove (8) we argue that with sufficiently high probability, (i) v becomes active at 
least once during an interval, (ii) there are no very old paths in the network, (iii) there 
are not too many paths in the network altogether and then (iv) we can argue that the 
first path that a processor v tries to establish is unlikely to use a fully loaded edge. 



5 Random Graphs 

We deal with two related models of a random graph. Gn,m vertex set [n] = 
{1,2, ... ,n} and and exactly m edges, all sets of m edges having equal probabil- 
ity. The random graph G^-reg is uniformly randomly chosen from the set of r-regular 
graphs with vertex set [n]. 

Let D be the median distance between pairs of vertices in graph Gn,m- Clearly it 
is not possible to connect more than 0{m/D) pairs of vertices by edge-disjoint paths, 
for all choices of pairs, since some choice would require more edges than all the edges 
available. In the case of bounded degree expanders, this absolute upper bound on k is 
0(n/ log n). The results mentioned above use only a vanishing fraction of the set of 
edges of the graph, thus are far from reaching this upper bound. In contrast, Broder, 
Frieze, Suen and Upfal [4] and Frieze and Zhao [9] show that for Gn,m ^nd G^-reg 
the absolute upper bound is achievable within a constant factor, and present algorithms 
that construct the required paths in polynomial time. 
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Theorem? Let m = m{n) be such that d = 2m/n > (1 + o(l))logn. Then, as 
n — y 00, with probability 1 — o(l), the graph Gn,m has the following property: there 
exist positive constants a and [3 such that for all sets of pairs of vertices {(a*, bi)\i = 
1 , satisfying: 

(i) K < [am log c?/ log n], 

(ii) for each vertex v, |{i : a* = t;}| + |{i : 6* = t;}| < m.m.{da{v), (3d}, 

there exist edge-disjoint paths in G, joining Oi to bi, for each i = 1,2,... ,K. Fur- 
thermore, there is an 0{nm^) time randomized algorithm for constructing these paths. 

Theorem 8 Let r be a sufficiently large constant. Then, as n — > oo, the graph Gr-reg 
has the following property whp.' there exist positive absolute constants a, (3 such that 
for all sets of pairs of vertices {(oi, bf) \ i = 1, ..., K} satisfying: 

(i) K < [arn/ log,, n], 

(ii) for each vertex v, \{i : ai = v}\ -\- \ {i : bi = v}\ < (3r, 

there exist edge-disjoint paths in Gr-reg, joining a* to bi, for each i = 1,2 ,... ,K. 
Furthermore, there is an 0(nf) time randomized algorithm for constructing these 
paths. 

These results are best possible up to constant factors. Consider for example Theorem 7. 
For (i) note that the distance between most pairs of vertices in Gn,m is (log n / log d ) , 
and thus with m edges we can connect at most 0(m log c?/logn) pairs. For (ii) note 
that a vertex v can be the endpoint of at most doiv) different paths. Furthermore 
suppose that d > rF for some constant 7 > 0 so that K > \ a'ynd /2] . Let e = 
07 / 3 , A = [en], and 5 = [n] \ A. Now with probability l-o(l) there are less than 
(1 + o(l))e(l — e)nd edges between A and B in Gn,m- However almost all vertices of 
A have degree (1 + o(l))o( and if for these vertices we ask for (1 — e/2)d edge-disjoint 
paths to vertices in B then the number of paths required is at most (1 -|- o(l))e(l — 
e/2)nd < K, but, without further restrictions, this many paths would require at least 
(1 — o(l))e(l — e/ 2 )nd > (l-|-o(l))e(l — e)nd edges between AandB which is more 
than what is available. This justifies an upper bound of 1 — e/2 for (3 of Theorem 7. A 
similar argument justifies the bounds in Theorem 8 . 

First consider Theorem 7 . The edge set £1 is first split randomly into 5 sets Ei,E- 2 , 
. . . ,E^. The graphs Gj = (V,Ei) will all be good expanders whp. As usual, Gi 
is used to connect ai, . . . ,bx to randomly chosen fii , . . . ,1>k using network flows. 
A random walk is then done in G 2 starting at each of these latter vertices and ending 
at fii , . . . ,1>K- After each walk, the edges are deleted from G 2 which keeps the walks 
edge disjoint. The length r of these walks is 0(log n / log d) but long enough so that the 
endpoints , . . . ,bx are (essentially) independent of their start points. This handles 
any possible conditioning introduced by the pairing of di with bi. Finally, di is joined 
to bi directly hy a random walk of length r in G 3 . G 4 and G 5 and the algorithm of [2] 
are used to connect the few pairs not successfully joined hy the above process. 

The success of this algorithm rests on the fact that if not too many walks, 

0{m log d/ log n), are deleted then 0 (m) edges will be deleted and we will be easily 
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be able to ensure that the degrees of almost all vertices stay logarithmic in size. Thus 
the remaining graphs will he good expanders. The reader should notice that we only 
need a constant number of short random walks to connect each pair and this is why we 
are within a constant of optimal. 

In Theorem 3 where degrees are hounded, we find that this argument breaks down 
because many (order n) vertices would become isolated through the deletion of the 
requested number of walks. The cure for this is to force the “action” to take place on 
a core of each subgraph (The fc-core of a graph H is the largest subset of V which 
induces a subgraph of minimum degree at least k in H. It is unique and can be found 
by repeatedly removing vertices which have degree less than k.) This raises technical 
problems, such as what is to be done when one of the endpoints of a proposed walk 
drops out of the core. These problems are dealt with in [9]. 

The problem of finding vertex disjoint paths in random graphs is dealt with in 
Broder, Frieze, Suen and Upfal [5]. 

Theorem 9 Suppose m = ^(logn+cu) where Lo(n) oo. Then there exists ct, /3 > 0 
such that whp/or all A = {ai ,a- 2 , ■ ■ ■ , B = {bi ,b- 2 ,. ■ ■ , 6/c} C [n] satisfying 

(i) A n S = 0 

(ii) 1^1 = \B\<^ 

(iii) \N{v) n U B)\ < l3\N{v)\. 

there are vertex disjoint paths Pi from ai to bifor 1 < i < K. Furthermore these paths 
can be constructed by a randomised algorithm in 0(nf) time. 

This result is best possible up to constant factors. 



6 Approximation Algorithm 

Kleinberg and Rubinfeld [10] describe an on-line Bounded Degree (BGA) Approxima- 
tion algorithm for the edge disjoint paths problem. BGA is defined by a parameter L 
as follows: 

(i) Proceed through the terminal pairs in one pass. 

(ii) When (a*, bi) is considered, check whether a* and 6* can still be joined by a path 

of length at most L. If so, route (o*, bi) on such a path Pi. Delete Pi and iterate. 

They prove the following: 

Theorem 10 Suppose G is an expander of maximum degree A. Then there exists c > 0 
such that with L = cA logn, BGA is an Oi\ogn\og\ogn)-approximation algorithm 
for the edge disjoint paths problem. 
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7 Final Remarks 

There has been a lot of progress since the first paper of Peleg and Upfal. The most 

interesting questions that remain to my mind are; 

1. Can we take K = f2(n/ logn), given sufficient expansion, in Theorem 3? 

2. More modestly, can we remove the n factor and make Theorem 4 con- 
structive? 

3. Can we achieve near optimal expander splitting as in Theorem 2, constructively? 

4. Is there a constant factor approximation algorithm for the edge disjoint paths 
problem on expander graphs? 

5. Can any of the above results be extended to digraphs? 
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Abstract. Min-wise independence is a recently introduced notion of 
limited independence, similar in spirit to pairwise independence. The 
later has proven essential for the derandomization of many algorithms. 
Here we show that approximate min-wise independence allows similar 
uses, by presenting a derandomization of the RNC algorithm for approx- 
imate set cover due to S. Rajagopalan and V. Vazirani. We also discuss 
how to derandomize their set multi-cover and multi-set multi-cover al- 
gorithms in restricted cases. The multi-cover case leads us to discuss the 
concept of k-minima-wise independence, a natural counterpart to fc-wise 
independence. 



1 Introduction 

Carter and Wegman [6] introduced the concept of universal hashing in 1979, 
with the intent to offer an input independent, constant average time algorithm 
for table look-up. Although hashing was invented in the mid-fifties, when for 
the first time memory become “cheap” and therefore sparse tables became of 
interest, up until the seminal paper of Carter and Wegman the premise of the 
theory and practice of hashing was that either the input is chosen at random 
or the hash function is chosen uniformly at random among all possible hash 
functions. Both premises are clearly unrealistic: inputs are not random, and the 
space needed to store a truly random hash function would dwarf the size of 
the table. Carter and Wegman showed that, in order to preserve the desirable 
properties of hashing, it suffices to pick the hash function from what is now 
called a pair-wise independent family of hash functions. Such families of small 
size exist, and can be easily constructed. 

Since then, pairwise independence and more generally fc-wise independence 
have proven to be powerful algorithmic tools with significant theoretical and 
practical applications. (See the excellent survey by Luby and Wigderson [11] 

* Supported by the Pierre and Christine Lamond Fellowship and in part by an ARO 
MURI Grant DAAH04-96-1-0007 and NSF Award CCR-9357849, with matching 
funds from IBM, Schlumberger Foundation, Shell Foundation, and Xerox Corpora- 
tion. 

M. Luby, J. Rohm, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 15-24, 1998. 
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and references therein.) One important theoretical application of pairwise inde- 
pendence is for the derandomization of algorithms. A well-known example is to 
find a large cut in a graph. One can color the vertices of a graph with \E\ edges 
randomly using two colors, the colors being determined by a pairwise indepen- 
dent hash function chosen at random from a small family. The colors define a 
cut, and on average the cut will have \E\/2 crossing edges. Hence, by trying every 
hash function in the family one finds a cut with at least the expected number of 
crossing edges, \E\/2. 

Recently, we introduced an alternative notion of limited independence based 
on what we call min- wise independent permutations [4]. Our motivation was the 
connection to an approach for determining the resemblance of sets, which can be 
used for example to identify documents on the World Wide Web that are essen- 
tially the same [2, 3, 5]. In this paper we demonstrate that the notion of min- wise 
independence can also prove useful for derandomization. Specifically, we use a 
polynomial-sized construction of approximate min-wise independent permuta- 
tions due to Indyk to derandomize the parallel approximate set cover algorithm 
of Rajagopalan and Vazirani [12]. (From now on, called the RV- algorithm.) This 
example furthers our hope that min-wise independence may prove a generally 
useful concept. 

The paper proceeds as follows: in Section 2, we provide the definitions for 
min-wise and approximately min-wise independent families of permutations. We 
also state (without proof) Indyk’s results. In Section 3, we provide the necessary 
background for the RV-algorithm. In particular, we emphasize how the property 
of min-wise independence plays an important role in the algorithm. In Section 4, 
we demonstrate that the RV-algorithm can be derandomized using a polynomial 
sized approximately min-wise independent family. Finally, in Section 5, we briefly 
discuss how to extend the derandomization technique to the set multi-cover and 
multi-set multi-cover algorithms proposed by Rajagopalan and Vazirani. This 
discussion motivates a generalization of min-wise independence to k-minima- 
wise independence, a natural counterpart to fc-wise independence. 

2 Min-wise independence 

We provide the necessary definitions for min-wise independence, based on [4]. 

Let Sn be the set of all permutations of [n]. We say that .F C is exactly 
min-wise independent (or just min-wise independent where the meaning is clear) 
if for any set X C [n] and any x E X, when tt is chosen at random ^ from T we 
have 

Pr(min{7r(V)} = 7 t(2:)) = |^. (1) 

In other words we require that all the elements of any fixed set X have an equal 
chance to become the minimum element of the image of X under tt. 

^ To simplify exposition we shall assume that tt is chosen uniformly at random from 
T , although it could be advatageous to use a another distribution instead. See [4]. 
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We say that ^ C is approximately min-wise independent with relative 
error e (or just approximately min-wise independent where the meaning is clear) 
if for any set X C [n] and any x £ X, when tt is chosen at random from T we 
have 



Pr(min{7r(X)} = 7r(a;)) 



1 

m 




( 2 ) 



In other words we require that all the elements of any fixed set X have only an 
almost equal chance to become the minimum element of the image of X under tt. 

Indyk has found a simple construction of approximately min-wise indepen- 
dent permutations with useful properties for derandomization [9] . The construc- 
tion is derived from a family of hash functions that map [n] to a larger set [m] 
and have certain limited independence properties. A function /i : [n] — > [m] in- 
duces a permutation tt of [n] as follows: sort the n pairs (h{x),x) for x e [n] 
in lexicographic order and define ty{x) to be the index of {h{x),x) in the sorted 
order. Thus, a family of hash functions {h : [n] — > [m]} induces a family of 
permutations of [n]. Indyk’s results imply the following proposition. 



Proposition 1. [Indyk] There exists constants ci and C 2 such that, for any 
Cl log(l/e)-wzse independent family H of hash functions from [n] to [c 2 n/e], the 
family of permutations on [n] induced by H is approximately min-wise indepen- 
dent with relative error e. 



Using the above proposition, an approximately min-wise independent family 
can be constructed as follows. Let r = [log(c 2 n/e)] . We need a hash function 
that associates to each element in [n] an r-bit string. We construct a string of 
length n ■ r bits, representing the concatenation of all the hash values, such that 
the bits are rcilog(l/e) independent. Thus the hash values for any cilog(l/e) 
elements are independent. Proposition 1 ensures that the family of permutations 
induced by this construction is approximately min-wise independent. Moreover, 
we can use the constructions of almost fc-wise independent random variables 
due to Alon et. al. [1]. The fact that the bits will be only approximately fc-wise 
independent can be absorbed into the relative error for the approximately min- 
wise independent family of permutations. As noted in [1], the construction of 
the appropriate approximately independent bit strings can be performed in NC, 
implying that the construction of an approximately min-wise independent family 
of permutations can be performed in NC. The size of the family of permutations 
obtained is 

Hence in what follows we will use the fact that there exist NC-constructible 
approximately min-wise independent families of permutations of size , 



3 The parallel set cover algorithm 

3.1 The problem 

The set cover problem is as follows: given a collection of sets over a universe of 
n elements, and given an associated cost for each set, find the minimum cost 
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Preprocess. 

Iteration: 

For each not-covered element e, compute value{e). 

For each set S, include S in ^ if J2eeu(S) valueie) > Cs/2. 

Phase: 

(a) Permute £, at random. 

(b) Each not-covered element e votes for the first 
set S in the random order such that e £ S. 

(c) If g value{e) > Cs/16, then add S to 

the set cover. 

(d) Remove from £. any set not satisfying 
Eee( 7 (s) value{e) > Cs/2. 

Repeat until C is empty. 

Iterate until all elements are covered. 



Fig. 1. The RV-algorithm for parallel set cover 



sub-collection of sets that covers all of the n elements. This problem (with unit 
costs) is included in Karp’s famous 1972 list [10] of NP-complete problems. (See 
also [8].) 

The natural greedy algorithm repeatedly adds to the cover the set that min- 
imizes the average cost per newly added element. In other words, if the cost of 
set S is Cs, then at each step we add the set that minimizes Cs /\U{S)\, where 
U{S) is the subset of S consisting of elements not yet covered. The greedy al- 
gorithm yields an factor approximation. (7J„ denotes the harmonic number 
Si<i<n V*-) Fo'' more on the history of this problem, see [12] and references 
therein. In particular Feige [7] has shown that improving this approximation is 
unlikely to be computationally feasible. 



3.2 A parallel algorithm 

The RV-algorithm is a natural modification of the greedy algorithm: instead 
of repeatedly choosing the set that covers elements at the minimum average 
current-cost, repeatedly choose some sets randomly from all sets with a suitably 
low minimum average current-cost. The intuition is that choosing several sets 
at a time ensures fast progress towards a solution; randomness is used in an 
ingenious way to ensure a certain amount of coordination so that not too many 
superfluous sets (that is, sets that cover few, if any, new elements) are used. 

Define the value of an element to be: 

value (e) = min 

SBe 



Cs 
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That is, the value of an element is the minimum possible cost to add it to 
the current cover. The algorithm of Rajagopalan and Vazirani is depicted in 
Figure 3.1. 

The preprocessing step is used to guarantee that the costs Cs lie in a limited 
range; this is not of concern here since it does not involve any randomization. 
The randomization comes into play when the sets of C are randomly permuted, 
and each element votes for the first set in the random order. This property is 
exploited in the analysis of the algorithm in two ways: 

1. The set that each element votes for is equally likely to be any set that 
contains it. 

2. Given any pair of elements e and /, let Wg be the number of sets containing 
e but not /, let N f be the number of sets containing / but not e, and let 

be the number of sets that contain both. The probability that both e and / 
vote for the same set is 

Nb 

N, + Nb + Nf. 

Interestingly, both of these properties would hold if C were permuted according 
to a min-wise independent family of permutations; in fact, this is all that is 
required in the original analysis. Hence if we had a polynomial sized min-wise 
independent family, we could derandomize the algorithm immediately. Unfor- 
tunately, the lower bounds proven in [4] show that no such family exists; any 
min-wise independent family would have size exponential in |£|. 

We therefore consider what happens when we replace step (a) of the parallel 
set cover algorithm with the following step: 

(a’) Permute C using a random permutation from an 
approximately min-wise independent family with 
error e. 

As we shall explain, for suitably small t this replacement does not affect the 
correctness of the algorithm, and the running time increases at most by a con- 
stant factor. Using this fact, we will be able to derandomize the algorithm using 
Indyk’s polynomial-sized construction. 

4 The derandomization 

We note that the proof of the approximation factor of the algorithm, as well as 
the bound on the number of iterations, does not change when we change how the 
permutation on C is chosen. Hence we refer the interested reader to the proofs 
in [12], and consider only the crux of the argument for the derandomization, 
namely the number of phases necessary for each iteration. 

As in [12], we establish an appropriate potential function 0, and show that 
its expected decrease A<1> in each phase is for some constant c. The potential 
function is such that if it ever becomes 0 we are done. In [12], this was used 
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to show that O(logn) phases per round are sufTicient, with high probability. By 
using a polynomial sized family of approximately min-wise independent permu- 
tations, we can try all possible permutations (on a sufficiently large number of 
processors) in each phase; in this way we ensure that in each phase the potential 
decreases by a constant factor. This derandomizes the algorithm. 

We review the argument with the necessary changes. The potential function 
is U{S). The degree of an element e, denoted deg(e) is the number of sets 
containing it. A set-element pair (S', e) with e e U{S) is called good if deg(e) > 
deg(/) for at least 3/4 of the elements / e U{S). We show that on average a 
constant fraction of the good (S, e) pairs disappear in each phase (because sets 
are added to the cover), from which we can easily show that E(2\^) > c<P. 

Lemma 2. Let e,/ G U{S) with deg(e) > deg(/). Then 

Pr(/ votes for S \ e votes for S) > ^ ^ . 

Proof. Let Ng be the number of sets containing e but not /, let Nf he the 
number of sets containing / but not e, and let Nh be the number of sets that 
contain both. The set S is chosen by both e and / if it the smallest choice for 
both of them; this happens with probability at least ]vT1^;T1v 7’ definition 

of approximate min-wise independence. Similarly, the set S is chosen by e with 
probability at most Hence 



Pr(/ votes for S' I e votes for S) > - — - ■ ^ 

^ - 1 + e Ng + Nb + Nf - 

The last inequality follows from the fact that Ng > Nf. 

The above lemma suggests that if (S, e) is good, and e votes 
should get many votes. Indeed, this is the case. 

Lemma 3. If (S, e) is good then 



1 - e 

2(l + e)' 

for S, then S 



Pr(S is picked 



I e votes for S) > 



1 -4e 

15 



Proof. Clearly value{f) < Cs/\U{S)\ for any / G U{S), so 

valueif) < ^ . 



f€U{S) 

deg(/)>deg(e) 



But if S G /I, then X]/6[/(s) volue{f) > Csl‘2.- Therefore 



Y valueif ) > ■ 

feu(S) 

deg(/)<deg(e) 
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By Lemma 2, if e votes for S', then each / with deg(/) < deg(e) votes for 
S with probability at least (1 — e)/(2(l -|- e)). Hence, conditioned on e vot- 
ing for S, the expected total value of all elements that vote for S is at least 
< 75(1 — e)/(8(l -I- r)). Let p be the probability that S is picked in this case. 
Then as the total value from all elements that vote for S is at most Cs, clearly 



pCs + ^ 



Cs{l-e) 
8(1 -Pe) ■ 



From this we obtain that p > (1 — 4e)/15. 



From the Lemma above we show that the expected decrease in the potential 
function is a constant fraction per round. 

Lemma 4. E{A<P) > 

Proof. As in [12], we estimate the decrease in <P due to each pair (S, e) when 
e votes for S and S joins the cover. The associated decrease is deg(e) since 
decreases by one for every remaining set that contains e. Hence 



^ Pr(e voted S and S was picked ) • deg(e) 

{S,e)-.eeU(S) 

> Pr(e voted S) ■ Pr(S' was picked | e voted S) ■ deg(e) 

(S,e) good 



> 



> 



E 

(S,e) good 



E 

(S,e)-.eeU(S) 



1 - 6 
deg(e) 

1 - 5e 
60 



l-4e 

15 

1 - 5e 



deg(e) > 



> 



60 



- 4 >. 



E 

(S,e) good 



1 -5e 
15 



If initially we have n sets and m elements, then initially < mn, and hence 
we may conclude that at most O (log nm) phases are required before an iteration 
completes. Given the results of [12], we may conclude: 

Theorem 5. The algorithm Parallel Set Cover can be derandomized to an 
NC^ algorithm that approximates set cover within a factor of 16iL„ using a 
polynomial number of processors. 



One may trade of the number of processors and a constant factor in the 
running time by varying the error e. However, the family must be sufficiently 
large so that e is small enough for the analysis to go through. Having e < 1/5 is 
sufficient (this can be improved easily, at least to e < 1/3). 



5 Extensions 

Besides the parallel set cover algorithm, Rajagopalan and Vazirani also provide 
algorithms for the more general set multi- cover and multi-set multi-cover prob- 
lems. In the set multi-cover problem, each element has a requirement re, and it 
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must be covered Ve times. In the multi-set multi-cover problem, multi-sets are 
allowed. These algorithms follow the same basic paradigm as the parallel set 
cover algorithm, except that during the algorithm an element that still needs 
to be covered r{e) more times gets r(e) votes. (Note r(e) is dynamic; r(e) = re 
initially.) 

Our derandomization approach using approximately min-wise independent 
families of permutations generalizes to these extensions as well, subject to a 
technical limitation that the initial requirements Ve must be bounded by a fixed 
constant. We need slightly more than approximate min-wise independence, how- 
ever. The following properties are sufficient^: 

— the ordered r(e)-tuple of the hrst r(e) sets containing an element e in the 
random order is equally likely to be any ordered r(e)-tuple of sets that con- 
tain e, 

— for any pair of elements e and / both in some set S, the ordered (r(e) -I- 
r(/) — l)-tuple of the first r(e) +r{f) — 1 sets containing either e or / in the 
random order is equally likely to be any ordered (r(e) -I- r(/) — l)-tuple of 
sets that contain either e or /. 

Note that when r(e) = r(/) = 1, these conditions are implied by min-wise 
independence, as we would expect. 

These requirements suggest a natural interpretation of min-wise indepen- 
dence: suppose that not just any element of a set X was equally likely to be 
the first after applying a permutation, but that any ordered set of k elements 
of a set X are equally likely to be the first k elements (in the correct order) af- 
ter applying a permutation to X . Let us call this fc-minima-wise independence. 
Then the properties above correspond to maxej(r(e) -I- r(/) — l)-minima-wise 
independence; if maxe r(e) is a fixed constant, then we require a fc-minima-wise 
independent family of permutations for some constant k. In fact, as with the 
parallel set cover problem, we require only approximate fc-minima-wise indepen- 
dence, and the construction of Indyk can be generalized to give us an appropriate 
family of polynomial size when fc is a constant. 

We note in passing that for estimating the resemblance of documents as in 
[2] and [5] with a “sketch” of size k we need one sample from a fc-minima-wise 
independent family, while for the method presented in [3], we need k separate 
samples from a min-wise independent family. 

There is an interesting meta-principle behind our derandomizations, which 
appears worth emphasizing here. 

Remark 6. Let £ be an event that depends only on the order of the first k 
elements of a random permutation. Then any bound on the probability of £ that 
holds for random permutations also holds for any fc-minima-wise independent 
family. Moreover, for any approximately fc-minima-wise independent family, a 
suitable small correction to the bound holds. 

^ In fact they are more than is necessary; however, stating the properties in this form 
is convenient. 
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For example, many of the lemmata in [12] prove bounds for events assuming 
that the random permutations are generated by assigning each set a uniform 
random variable from [0, 1] and then sorting. Because the events these lemmata 
bound depend only on the first (r(e) + r(/) — 1) sets of the permutation, the lem- 
mata still hold when using (r(e) -I- r(/) — l)-minima-wise independent families, 
and only minor corrective terms need to be introduced for (r(e) -I- r(/) — 1)- 
minima-wise independent families. Hence given the results of [12], the deran- 
domizations follow with relatively little work. 

6 Conclusion 

We have demonstrated a novel derandomization using the explicit construction of 
approximate min-wise independent families of permutations of polynomial size. 
We expect that this technique may prove useful for further derandomizations. 

The question of how to best construct small approximately min-wise indepen- 
dent families of permutations remains open. Improvements in these constructions 
would lead to improvements in the number of processors required for our deran- 
domizations here, and more generally may enhance the utility of this technique. 
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Abstract. Recently Komlos, Sarkozy, and Szemer^di proved a strik- 
ing result called the blow-up lemma that, loosely speaking, enables one 
to embed any bounded degree graph H as a spanning subgraph of an 
e-regular graph G. The first proof given by Komlos, Sarkozy, and Sze- 
meredi was based on a probabilistic argument [8]. Subsequently, they 
derandomized their approach to provide an algorithmic embedding in 
[9]. In this paper we give a different proof of the algorithmic version of 
the blow-up lemma. Our approach is based on a derandomization of a 
probabilistic proof of the blow-up lemma given in [13]. The derandom- 
ization utilizes the Erdos-Selfridge method of conditional probabilities 
and the technique of pessimistic estimators. 



1 Introduction 

Given a graph G and two disjoint subsets U and W of its vertex set V(G), 
denote by ea{U, W) the number of edges of G with one endpoint in U and the 
other in W. Define the density doiU, W) of the pair (C7, W) in G by daiU, W) = 

Let e > 0, and let G — (Vj , V 2 ',E) be a bipartite graph. G is called e- 

regulariffoi every pair of sets (U,W), U cVi,W C V 2 , \U\ > e|Ui|,|IU| > e|U 2 |, 

\dG(U,W) - dG(Vi,V 2 )\ < e. (1) 

Let e > 0 and 0 < d < 1. A bipartite graph G = (Vi,V 2 ',E) is called super 
(d,e) -regular if the following conditions hold: 

(1) G is e-regular, 

(ii) for each v e Vj, (d - e)|U 3 _i| < deg{v) < (d -f e)\Vz-i\, t = 1, 2. 

Super (d, e)-regular graphs with |Vj| = \V 2 \= n and d > 2e satisfy the Hall 
condition and thus contain a perfect matching. In fact, as it has been shown in 

[2] , the number of perfect matchings in such graphs is close to d”n!. 

Given two graphs H and G on the same number of vertices, we call a bijection 
/ : V (H) V (G) an embedding of H into G if / maps every edge of H onto an 
edge of G. In other words, / is an isomorphism between H and a subgraph of 
G. In [8] the following result was proved. 
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Theorem 1 (Blow Up Lemma). For every choiee of positive integers r and 
A, and 0 < d < 1, there exists an 6 > 0 and an integer n = n(S) such that the 
following holds. Let G be an r -partite graph with all partition sets Vi,...,Vr of 
order n and all (0 bipartite subgraphs G\Yi,Vj\ super {d, 5) -regular. Then, for 
every r-partite graph H with maximum degree A{H) < A and all partition sets 
Xi,..., Xr of order n, there exists an embedding f of H into G that maps Xi 
onto Vi, i = 1,2, . . . ,r. 

Hence, it is possible to embed any graph H with bounded maximum degree as 
a spanning subgraph of a dense, e-regular graph G. The Blow-up Lemma together 
with the Regularity Lemma of Szemeredi [15] enable one to tackle and solve 
difficult problems like the Posa-Seymour conjecture on powers of hamiltonian 
cycles [10] or the Alon-Yuster conjecture on perfect F-matchings [11]. Since the 
Regularity Lemma has been already made algorithmic in [1], it is very important 
to provide a constructive version of the Blow Up Lemma too. 

The idea of the original proof of Theorem 1 (cf. [8]) involves sequentially 
embedding the vertices of H into G while the sets of candidates for images of 
unembedded vertices are not threatened. When all but a small fraction of the 
vertices are already embedded, the remaining vertices are all embedded at once 
using the Kdnig-Hall theorem. In [13] an alternative proof, using random perfect 
matchings of super regular graphs, where H is embedded into G in only a con- 
stant number of rounds, was proposed. In this paper we present an algorithmic 
version of that proof. (An algorithmic version of the original proof of Theorem 

1 can be found in [9].) 

Theorem 2. There is an algorithm EMBED which, in time polynomial in n, 
does the following. Given an r-pariite graph G with all partition sets Vj, ..., Vj- of 
order n and all (Q bipartite subgraphs G[Vi,Vj] super {d, 6) -regular, and given 
an r-partite graph H with maximum degree A{H) < A and all partition sets 
X\,..., Xr of order n, where r and A are arbitrary positive integers, 0 < d < 1, 
and S = S(r,A,d) > 0 is sufficiently small, EMBED constructs an embedding f 
of H into G that maps Xi onto Vi, i — 1, ...,r. 

The algorithm EMBED consists of two phases. In the preliminary Phase 1, 
outlined in Sect. 2, refinements of the partitions of V (H) and V (G) are obtained 
and new edges are added to both H and G. Then in the main Phase 2, described 
in Sect. 3, H is embedded into G with the partition sets mapped accordingly. At 
the end of Sect. 3 a more formal description of algorithm EMBED is provided. 
As a crucial ingredient, we need to derandomize a probabilistic result on random 
perfect matchings of a super regular graph (the Catching Lemma). This part, 
which we believe is of independent interest, is treated in Sect. 4. 

2 Partitions and Enlargements 

In this section we describe how to construct finer partitions of V (H) and V (G) . 
This will be done by the algorithm PARTITION. We will only outline its two 
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major steps. As an input, H and G are the graphs described in Theorem 2 
with initial partitions V{H) = Xi U • • • U and V{G) = Vj U • • • U Vj., and 
l^il = \Vi\=n,i = l,...,r. 



PARTITION V{H): 

Let I be the unique positive integer so that 

2Z\2 + 1 < 2' < 4zl^ (2) 

Partition each Xi into t — 2^ sets Xi^j, j — 1, . . . ,t, of size m or m + 1 where 
m = [2“^nJ so that each pair of distinct sets (Xi^j^,Xi^j 2 ) spans in a 
(possibly empty) matching. 

To obtain the finer partition of V{H), we will use an algorithmic version of 
a result on graph packing by Sauer and Spencer given in [14]. Given two graphs 
r and r' on the same n-vertex set V, we say that a bijection tt : V — >■ P is a 
packing of F and F' if E{F) f]TT{E{F )) — 0, where 
7r(E(r')) = {(7r(u),7r(u)) : {u,v) e E{F')]. 

Lemma 1. Let F and F' be two graphs on an n-vertex set V satisfying 
2A{F)A{F') < n. There exists a polynomial-time algorithm PACK that finds a 
packing of F and F' . 

We apply algorithm PACK as follows. Given a graph F, the square of F, 
denoted by F'^, is the graph obtained from F by joining each pair of distinct 
vertices of F whose distance is at most 2 by an edge. Denote by Tj the subgraph 
of induced by A*. Set y = n — 2^m and let F' be the vertex-disjoint union of 
2^ —y cliques of order m and y cliques of order m -I- 1. Note that A(/j) < A^ and 
A(F') = m, so we have 2A(Fi)A(F') < n. Hence, applying the algorithm PACK 
to Fi and F' as defined above yields a finer partition of Xi into 2^ sets Xij, of 
size m or m -h 1, which are independent in Clearly, the edges of H which go 
between any two such sets are pairwise disjoint. Thus, each pair of distinct sets 
(^ii, in ^*2,72) spans in a (possibly empty) matching. 

PARTITION V{G): 

Partition each V) into t — 2^ sets Vij, j — l,2,...,t, of size m or m -|- 1 so 
that jVijj = jAijj, i — 1,2, ... ,r, j — 1,2, ... ,t, and so that each pair of 
distinct sets (Vji.ji , Li2,i2) *1 7^ *2 spans a super {d, 2(A^ ■+■ l)<5)-regular 

subgraph. 

As the resulting partition sets have size at least m > n/{2A^ 2), the 

2(Z\^ -I- l)(j-regularity of all pairs follows immediately from the assumption that 
all pairs (V),Vj) are d-regular. To establish the super regularity we have to 
control how the degree of every vertex splits between the new partition sets. For 
this we recall a slight extension of a result of Alon and Spencer given in [3]. It 
derandomizes a standard application of Chernoff’s bound. 
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Lemma 2. Let T be a family of k subsets of an n-elernent set Q. There exists 
a polynomial-time algorithm HALF that partitions fl into fl = J7+ U fl~ such 
that for each F £ F, 

+ (3) 

where j5 = \j2nlog{2k). 

We construct the finer partition of each Vi in I iterations where I is defined in 
(2). In the first iteration, we apply HALF to F = {Vi] U [N{v) n "F* : n ^ Vi). 
Thus, Vi and all of the neighborhoods in F get roughly halved and two new 
families are generated. In the second iteration, we apply HALF twice, once to 
each family. This generates four new families, etc. 

More generally, the j-th iteration will consist of applying HALF once to each 
of the 2H^ families that are generated in the (j — l)-st iteration. After the Fth 
iteration, each Vi is partitioned into t = 2^ sets so that each set has size roughly 
m and also each neighborhood has shrunk in the same proportion. To ensure 
that the convention |F ^ | = \Xi^j \ is satisfied for alH = 1, . . . , r and j = 1, . . . , t, 
we arbitrarily move vertices around. This affects the degrees very little. The 
cumulative error introduced by this procedure is only 0{f3) = o(n). The super 
{d,2{A? + 1)(5)- regularity of the subgraphs spanned by pairs of these refined 
partition sets follows easily. 

Below we outline a simple procedure ENLARGE which is performed merely 
for convenience. It smoothes out the recursive embedding described in the next 
section. 

Given a bipartite graph F — {U,V', E), a. matching M C E is called saturating 
if |M|=min(|[/|,|F|). 



ENLARGE E{H) and E{G) : 

(i) Add edges to form a supergraph H' of H so that each pair {Xi^^j^,Xi^^j^) 
spans a saturating matching. 

(ii) Add edges between the pairs (Fi.ji, F 2 ,j 2 ) *i = *2 (he. pairs with 
density 0) to form a supergraph G' of G so that each such pair spans in G' 
a super {d,2{A^ + l)(5)-regular subgraph. 

In step (ii), we may insert between each pair hi 2 ,j 2 )> *i = * 2 , the same 

(our favorite) super {d,2{A^ + l)d)-regular graph. 

Note that every embedding of H' into G' that maps Xij onto for all i 
and j yields a desired embedding of H into G. 



3 Embedding 

After the finer partitions of V (H) and V (G) have been obtained, we rename 
the sets Xij and Vij to E* and IF* and restore the notation H and G for H' 
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and G' . More precisely, with s = rt, H is now an s-partite graph with partition 
V (H) = ii U • • • U is, where m + 1 > |id| > ... > |Ys| > m, so that each pair of 
distinct sets (Yi, Yj) spans a saturating matching. Similarly, G is now an s-partite 
graph with partition V (G) = iFi U • • • U Ws, \ Wj \ = |ij | for each j = 1, . . . , s, so 
that each pair of distinct sets (fF*, Wj) spans a super (d, e)-regular graph where 
e = 2(A^ + 1)S. 

The goal of this section is to outline how algorithm EMBED from Theorem 
2 constructs an embedding of H into G so that each Yj is mapped onto the set 
W^. 

Before describing the embedding, we introduce some dehnitions and notation. 
For every 1 < j < s — 1 and each vertex x 6 1 U • • • U 1^, let ATj (x) denote the 
set of precisely j neighbors of x which belong to Fi U • • • U 1}. Given a bijection 
fj between Fi U • • • U 1 j and IFi U • • • U W} , let Mj (x) = fj {Nj (x) ) . Given fj , for 
each t = j 1 , . . . , s, we define a bipartite auxiliary graph with bipartition 
(F, Wt) and edge set 

E{A^j) = {xv : X EYt,v E Wt and uv E E{G) for each u E Mj{x)}. (4) 

We call the graphs A* candidacy graphs because the edges of join a given 
vertex x E Yt to all vertices of Wt which, after fj embeds Fi U • • • U i onto 
TFi U • • • U PFj , are still good candidates for the image of x. 

We will embed H into G recursively. Let /i be any bijection between Yi 
and Wi. Assuming that there exists an embedding /j_i of H[Yi U • • • U ij-i] 
into G[Wi U • • • U Wj-\\ so that the candidacy graphs A^j_^, t = j, . . . ,s, are 
super ej_i)-regular, extend fj-i to fj by constructing a perfect matching 

Uj :Yj ^ Wj in which makes the graphs A*-, f = j -P 1, . . . , s super (dGcj)- 
regular, and set fj{x) — /j_i(x) if x G Fi U...UFj_i and fj{x) — (Jj{x) if x G Yj. 
This operation of extending /j_i to fj will be designated by fj t— fj-i + Uj. 
(Here and throughout we view a perfect matching as a bijection along the edges 
of a bipartite graph rather than as a set of edges.) 

Note that the instance j = s yields the desired embedding. The request to 
make all future graphs A* super regular serves to carry over the recursion. The 
existence of the perfect matching aj follows from a result called the Four Graphs 
Lemma given in [13]. Its probabilistic ingredient is Lemma 4 below which we 
will refer to as the Catching Lemma. We will first explain how the Four Graphs 
Lemma works. Then, in the hnal section, we will outline how to derandomize 
the Catching Lemma. 

Let 2 < j < f < s be hxed and consider the following graphs: 

1. Let ri = 

2. Let F 2 = be the bipartite graph spanned by the pair (Wj,Wt) in G. 

3. Let F 3 = Tg’* denote a bipartite graph with bipartition (Yj,Wt) such that 

xw G E{rl'^) if and only if yw G £'(A*_g), where y is the unique neighbor 
of X in Y). (If |Fjj = |F| — 1 then there is one vertex xt in Yj with no match 
in Yt; in such a case we arbitrarily join Xt to + 1 )J vertices of Wt; 

however, for clarity of exposition we do assume that |lj| = |F|.) 
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4. Given a perfect matching cr in let be a snbgraph of fy’* so 

that xw e E(Ay^) if and only if both xw G E{r^’^) and a{x)w G E{r^’^). 

Observe that /g’* is isomorphic to ^*-1 isomorphic to . Observe 

also that by the induction hypothesis, both Ef and Jg’* are super (d^~^ , ej-i)- 
regular (in the “ugly” case when \Yj\ = \Yt\ — 1, adding the vertex Xt and its 
neighbors could slightly affect the super regularity of to be on the safe side 
one would need to double ej-i)- Finally, by our assumption on G, is super 
{d, e)-regular. 

The Four Graph Lemma asserts (under one additional assumption) that most 
of the perfect matchings a of Tf are such that the graphs A{;^, t = j + 1, s, 
are all super (d-^, ej)-regular, where ej = h{tj-i) and h is an easily computable 
function which decreases to 0 when its argument does so. The additional demand 
is that the endpoints of each edge of Tf have about d>m common neighbors in 
each Yt- Precisely, for every edge vu G E( and for each f = j + 1, s, we require 
that 



{d^ — e)m < |A^r 2 (^') G Npsiu)] < (d^ + e)m . (5) 

It can easily be conformed with by throwing away the edges of E^ which fail 
to have this property. We omit the description of algorithm PEEL which does 
it. The residual subgraph E^ is still super regular with slightly bigger second 
parameter (c.f. Fact 1 in [13]). From now on we will be assumimg that E( 
does satisfy this additional assnmption, i.e. we set Ej t— T/. This constraint 
guarantees that for every perfect matching a of E( , and for each t, the degree 
dA^{x) of each vertex a; G F} is close to d^m. Observe that dA„{v) for n G Wt 
is precisely equal to the number of edges of a which connect a vertex of Nr2{v) 
with a vertex of Nr^iv). The Catching Lemma assures that this number is right 
for most of the cr’s. Hence, provided we can derandomize the Catching Lemma, 
the second condition in the definition of a super regular graph is taken care of. 
To establish the e^-regularity alone of A^* we rely on the following criterion from 
[1]- 

Lemma 3 . If E = {U,V;E), |G| = |E| = m, is a bipartite graph with at least 
(1 — 5e)m^/2 pairs of vertices Wi,W2 G U satisfying 

(i) deg{wi),deg{w2) > {d — e)m, and 

(ii) |A^(tci) n A^(tC 2 )| < (d + e)^m, 

then E is -regular. 

Note that \Na, (fi ) G Ayi^ ( 1 ) 2 ) | eqnals the number of edges of a which connect 
a vertex of (f 1 ) G Np^ (f 2 ) with a vertex of Np^ (f 1 ) H Np^ (^2 ) . For most pairs 

V\,V2 G Wt neither of these sets is too large (call these pairs good) and, by the 
Catching Lemma again, there cannot be too many edges of a random cr between 
them. The probability of failure is exponentially small, so that we are in position 
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to demand that a random perfect matching of r[ simultaneously inserts the right 
number of edges between each pair Nt2 (v), Nps (n) for all n € Wt,t = j + 1, 
as well as between each pair AV 2 (fi)n Nr 2 (v 2 ), Nrs{vi)ri Nr 3 (v 2 ) for all good 
pairs Vi,V 2 G Wt, t = j + 1, 

In the next section we describe an algorithm CATCH which constructs the 
desired perfect matching. We wrap up this section by a description of algorithm 
EMBED. 



Algorithm EMBED 
Input: r-partite graphs H and G as in Theorem 2. 

Output: An embedding f of H into G so that each Xj is mapped onto Vi, 
i = l,...,r. 

Phase 1: 

1. Apply PARTITION to V{H) and V{G) and denote the output by V{H) = 
El U ■ • ■ U W and V{G) = ITi U ■ • ■ U IPs, where 

m < |Ei| = |Wi| < ... < ini = |IP,| < m + 1. 

2. Apply ENLARGE to H and G with output H' and G'; 

H and G i — GL 

Phase 2: 

1. Let /i be any bijection from Yi to W\. 

2. j ^ 2 ; WHILE j <s DO: 

(a) Apply PEEL to r(; denote output by T/ ; 

rfVr/. 

(b) Apply CATCH to the graph P/ to the pairs 

Nj,i.t{v)), veWt,t = j + l , ..., s, 

(Xj-i.t (ui) n N^j,t {v- 2 ), (ui) n Npj,t (V 2 )) 
for all good pairs vi,V 2 € Wt, t = j + 1, ..., s; 
denote the output by crj . 

(c) fj fj-i + J J + 1- 

3. / ^ /s 

4 Catching 

The Catching Lemma appeared first in [13] (c.f Lemma 1). The name comes from 
the fact that a pair of sets catches in between a number of edges of a random 
perfect matching which is close to its expectation. In our application to the Eour 
Graphs Lemma described in Sect. 3, we actually have k = 0{nn?) of these pairs. 

Lemma 4 (Catching Lemma). For every choice of three real numbers 0 < 
d,di,d 2 < 1 there exists e > 0, c = c(e), 0 < c < 1, mo(e) and a function 
g{x) 0 as X ^ 0 such that the following holds. Let F he a super {d, e)-regular 
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graph with hipartition (Vi,V 2 )j \Vi\ = |^ 2 | = m > mo(e), and let Si C Vi, 
dim < l^l = Si < d-zm, and Tt C Vz, dim < |T| = ti < dz, i = 1,2, 
where k = O(m^). Then, for a perfect matching a of F, drawn randomly with 
the uniform distribution, the event that for each i = 1,2, k 

Siti/m - g{e)m < |<j(S'j) n Tj| < SiU/m + g{e)m (6) 

holds with probability at least 1 — c™. 

As it was explained in Sect. 3 we need to design a subroutine CATCH, 
which derandomizes the above lemma simultaneously for all pairs (S'*,Tj), i.e. a 
procedure which constructs the desired perfect matching of F. 

How to derandomize this lemma, i.e. how to effectively find the required 
perfect matching? The most straightforward approach seems to be the Erdds- 
Selfridge method of conditional probabilities (see [4]) and the technique of pes- 
simistic estimators (see [12]). In basic terms, pessimistic estimators are easily 
computable functions that are used to provide an upper bound on probabilities 
that cannot be computed efficiently. Here, the situation calls for just such a tech- 
nique, since it is not known how to compute the exact probability of the event 
that there exists an i such that |cr(5j) fi Tj| falls outside the required interval. 
However, as these upper bounds hold true only as long as the size of the resid- 
ual subgraph of F remains reasonably large, at some point we need to stop the 
procedure and complete the current matching by any matching we can possibly 
find in the leftover graph. The problem we immediately face is that this leftover 
subgraph, though e'-regular for some e', does not need to be super regular and 
therefore may not have a perfect matching at all. 

To overcome this difficulty we invoke a new idea. Take a subgraph F' of F on 
half of the vertices so that the other half is super regular and every set 5* and T* 
is roughly halved, and find a suitable almost perfect matching in F'. By suitable 
we mean one which satisfies (6) with respect to the halves of the sets Si and 
Ti, and with g{e) replaced by \g{e). We then move the leftover vertices to the 
other half which still remains super regular and repeat as long as the remaining 
part is larger than lg{e)m. Finally we find a perfect matching in the leftover 
subgraph of F which is so small that it cannot affect the required property of 
the constructed perfect matching a. (The third 1/3 serves as a cushion for all 
the inaccuracies we commit along the way.) 

More formally, we prove the following theorem. 

Theorem 3. There is a polynomial time algorithm CATCH which for a given 
super {d,e)-regular bipariite graph F and k = O(n^) pairs of sets (5j,Tj), as in 
Lemma f, finds a perfect matching a of F such that for each i = l,...,k, the 
inequalities (6) hold. 

The algorithm CATCH uses 3 subroutines: HALF, MATCH and HALL. Pro- 
cedure HALL just finds a perfect matching in any bipartite graph satisfying 
Hall’s condition. We may use, for instance, the algorithm of Hopcroft and Karp, 
given in [6], of complexity 
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Procedure HALF is defined in Lemma 2. 

Procedure MATCH does the same job as CATCH, but in the space of almost 
perfect matchings, i.e. matchings covering all but vertices on each side. 
With the same input as CATCH it outputs an almost perfect matching a' which 
satisfies condition (6) with g{e) replaced by |(?(e), along with a pair of leftover 
sets Lu and Ly, both of size 

Algorithm MATCH is based on a probabilistic lemma very similar to Lemma 
4 above. To derandomize that lemma we indeed apply the Erdos-Selfridge method 
of conditional probabilities. As we are now after an almost perfect matching, the 
problem of finishing it off no longer exists. For details see the full size paper. 

We conclude this extended abstract by a description of the algorithm CATCH. 
Given two disjoint matchings cr and a', their union will be denoted by cr + cr'. 

Algorithm CATCH 

Input: A super (d, e) -regular graph F = (U,V;E) with \U\ = \V\ = m, and 
sets (5j, Ti), i — 1, ..., k, as in Theorem 3. 

Output: A perfect matching a :U that satisfies, for each i — 1, 2, . . . , fc, 
inequalities (6). 

1. O' t— 0. 

2. WHILE \U\ > y{e)m DO: 

(a) Apply HALF to {17, St, i — 1, ..., k, Nr{v),v 6 P}. Denote the output by 
Ui and U 2 - 

(b) Apply HALF to jP, T*, i — 1 , ..., k, Nr(u),u € U}. Denote the ontpnt by 
Pi and P2. 

(c) Apply MATCH to r[Ui,Vi] and (5* fl f7i,Tj n Pi), * = 1, ..., k. Denote 
the output by a', Lu and Ly 

(d) a a + a', 

U f— C/2 U Lu, P •<— V2 U Ly, 

r^F[u,v], 

Si i — Sj n u , Ti i — T) n P, * = 1 , ..., k. 

3. Apply HALL to T[17, P]. Denote the output by a'; 
a a + a'. 
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Abstract. Given a hypergraph and a set of colors, we want to find a 
vertex coloring to minimize the size of any monochromatic set in an edge. 
We give deterministic polynomial time approximation algorithms with 
performances close to the best bounds guaranteed by existential argu- 
ments. This can be applied to support divide and conquer approaches to 
various problems. We give two examples. For deterministic approximate 
DNF counting, this helps us explore the importance of a previously ig- 
nored parameter, the maximum number of appearance of any variable, 
and construct algorithms that are particularly good when this parame- 
ter is small. For partially ordered sets, we are able to constructivize the 
dimension bound given by Fiiredi and Kahn [5] . 



1 Introduction 

A hypergraph H(V, E) consists of a set V of nodes and a set E of edges, where 
each edge is a subset of nodes. An undirected graph is just a hypergraph where 
each edge contains exactly two nodes. There are four parameters associated with 
a hypergraph: 

- n = \V\, number of nodes. 

— m = \E\, number of edges. 

— t = max{|e| : e G E}, size of the largest edge. We will write E C U-*. 

- d = m-dx{deg{v) : v GV}, where degiv) — \{eGE:vG e}|. d is called the 
degree of H. 

Given k colors, we want to color nodes so that no color appears more than 
c times in any edge. Clearly c > /x = and we want to have c = jja for a as 
small as possible. This problem was studied before by Srinivasan [11], who gave 
the following (nonconstructive) existential bound for c: 

if)U = J7(logd)) 

I Q( iog((‘io|d)/p) ) otherwise. 

We give a deterministic polynomial time algorithm for finding a coloring with 

_ / 0(/u) if /u = n{log{td)) 

"" ~ { otherwise. 
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© Springer-Verlag Berlin Heidelberg 1998 




36 



C.J. Lu 



When t = or /U = f2(log{td)), our bound for c is within a constant factor 
of the bound given by Srinivasan. Also, for our applications below, our bound 
suffices. In fact, a much more involved method can actually constructivize Srini- 
vasan’s bound for c, but we leave it to a later paper. 

Notice that such a fe-coloring partitions the original hypergraph into k sub- 
hypergraphs, one for each color. In each sub-hypergraph, every edge now has at 
most c edges. This turns out to support some divide and conquer approaches. 
We will give two examples. 

Our first application is to the deterministic DNF approximate counting prob- 
lem. Given a DNF formula F of n variables and m terms, we want to estimate 
its volume, defined as 



voliF) = = 1]> 

within an additive error e. Luby, Velickovic, and Wigderson [7], following the 
work of Nisan [8] [9], gave a deterministic time algorithm. Lnby and 

- r I m log n \ e ^ \ ^ ^ 

Velickovic [6J gave a deterministic 2^^^^ ^ ) time algorithm, 

which is good when large error e is allowed. Note that a DNF formula F can 
be naturally modeled by a hypergraph, with nodes corresponding to variables 
and edges corresponding to terms. Now the degree d of the hypergraph indi- 
cates the maximum number of times a variable is read in F. This parameter has 
not received attention before for this problem, and is our focus here. We con- 
struct deterministic algorithms with running times f~)) and 

2(iog £ )(iog 7)(2 ) respectively. Note that d is at most m, so our 

first algorithm is never worse than that of Luby, Velickovic, and Wigderson [7] , 
and is particularly good when d is small and e is large. Our second algorithm 
is better than that of Luby and Velickovic [6] when d < 2^ “ , and is better 

than our first algorithm when d < 2"‘(*°s*°g “f f . 

Our second application is to dimensions of partially ordered sets (posets). 
Let (F, <) be a poset. Its dimension, denoted as dim{P), is defined to be the 
minimum number of linear extensions L\, ... ,Ld such that F = Fi fl • • • fl 
(i.e., X < y \S. X <L{ y for all i). For x G P, let U(x) = {y G P : y > x} be the 
set of upper bounds for x, and let C(a;) = {y G P : y > x oi y < x} he the set 
of elements comparable to x. Fiiredi and Kahn [5] gave the following existential 
bound: for some constants Ci and C 2 , 

dim(P) <r = minjcif log^ t, C 2 ulog |F|}, 



where t = maxa,ep |G(a;)| and u = maxa,ep |17(a;)|. One key ingredient in their 
proof is the hypergraph coloring problem, where a poset (F, <) is modeled by a 
hypergraph H(V,E) with V = P and F = {U{x) : x G P}. Using our coloring 
algorithm, together with other ideas, we are able to constructivize their existence 
bound. That is, we give a deterministic polynomial time algorithm for finding 
0(r) linear extensions with intersection equal to the given poset. 

We believe that there should be more applications of our hypergraph coloring 
algorithm. 




Deterministic Hypergraph Coloring and Its Applications 



37 



2 Hypergraph Coloring 



Consider a hypergraph H{V,E) with n = \V\, m = \E\, E C V-^, and degree 
d. We want to color nodes with k colors such that no edge contains c nodes of 
the same color. For a coloring, call an edge had if it contains c nodes of the same 
color, and call a set of edges bad if all edges in it are bad. Our goal is to hnd a 
good fe-coloring 7 such that no edge is bad. If we choose 7 randomly, then for an 
edge e, 

B,[eis bad] < 

From the Lovasz local lemma [4], a good fc-coloring existes provided {^ykdt < 
However, a random fc-coloring sometimes is good with exponentially small 
probability, and it is not obvious how to find such a good coloring, even proba- 
bilistically. Beck [2] had the first success in derandomizing the local lemma, and 
Alon [1] later adapted Beck’s idea to derandomize more applications of the local 
lemma. We will follow their approach closely. 

Our main result in this section is a deterministic polynomial time algorithm 
to hnd a good fc-coloring, satisfying 






which leads to the bound for c given in Eq (1). For this value of c, a random 
|-coloring turns an edge e bad with probability less than p = (^)^. We hrst 
give a randomized algorithm and then we derandomize it. 



2.1 A Randomized Algorithm 

There will be at most three phases, each using a distinct set of | colors. The 
intuition is that a random |-coloring is unlikely to have a large cluster of bad 
edges and that each bad cluster can be recolored separately, for a proper deh- 
nition of “ cluster” . For a hypergraph H, its line graph Lh is the graph where 
nodes are the edges of H and two nodes are adjacent iff the corresponding edges 
in H intersect. Let be the graph with the same node set but now two 

nodes are adjacent iff their distance is exactly a or 6 in Lh- Call a set of edges 
in H an (a, 6)-tree if the corresponding nodes in are connected. Our al- 

gorithm consists of phases. In the first phase, we find a |-coloring such that all 
bad (1, 2)-trees have size 0{dt\ogm). Then we try to recolor each bad (1, 2)-tree 
separately, using a new set of | colors. If m is small, the recoloring can be done 
in the second phase. Otherwise we need another phase, using another set of | 
colors. 

Phase 1: 

In this phase, we will find a |-coloring such that all bad (1, 2)-trees have size 
0{dt\ogm). First we need the following lemma: 
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Lemma 1. For some u = 0(logm/ log(di)), the probability that a random |- 
coloring has a bad (2, 3)-iree of size u is 

Proof: Any two edges of a (2, 3)-tree have no node in common, and the events of 
each being bad are independent. As there are at most 3)- 

trees of size u, and each one is bad with probability at most p“, the probabilty 
that a random |-coloring has a bad (2, 3)-tree of size u is at most 

m{2,{dtfpY < 

□ 

As a (l,2)-tree of size dtu must contain a (2,3)-tree of size u, a random |- 
coloring with high probabilty will have no (l,2)-tree of size dtu = 0{dtlogm). 
In the next section, we will show how to hnd such a |-coloring deterministically, 
by using the standard technique of conditional probability with a pessimistic 
estimator. 

Phase 2: 

Snppose we have found a |-coloring with no bad (1, 2)-tree of size dtu. Then 
we try to recolor these bad (l,2)-trees . Let T = {Vt,Et) be a bad (l,2)-tree . 
When we recolor nodes in T, those good edges intersecting T are also affected, 
and we want to make sure that they won’t turn bad after the recoloring. So 
together with T, we also take into account those good edges but with nodes not 
in T removed, and consider the coloring problem for this hypergraph S. More 
precisely, S = (Vs, Es) where Vs = Vt and Es = {eCiVs : e € Eh, e fl Vs 0}- 
It’s easy to see that the condition of the local lemma still holds and a good 
|-coloring exists for S. We will use a different set of | colors in this phase. If we 
hnd a good |-coloring for S, then after this recoloring, no edge of H intersecting 
T is bad. Now as each edge of H intersects at most one bad (1, 2)-tree , we can 
repeat this recoloring process for each bad (l,2)-tree , using the same new set 
of I colors. 

Note that now < (dt)‘^u. Suppose y^logm/log logm < dt. Then the 
probability that a random |-coloring has a bad edge in S is at most 

\Es\p < (dtrij^r = 1 - 

We can hnd a good |-coloring in deterministic polymonial time, nsing again the 
technique of conditional probability. 

Otherwise, when dt < i/log m / log log m, we can hnd a |-coloring such that 
all bad (1,2 (-trees have size at most 

0{dtlog((dt)^ logm) / log{dt)) = O ( i/log m log log m / log(dt ) ) , 

similarly to phase 1. Then we enter pahse 3. 

Phase 3: 
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Now as f < \/log m / log log m , each bad (l, 2 )-tree has 0(logm/log(dt)) 
nodes, and we can use an exhaustive search to find a good |-coloring in deter- 
ministic (|)0(logm/log(rft)) ^ ^O(l) 

2.2 Derandomization of Phase 1 

Let R denote the set of all ( 2 , 3 )-trees of size u = and let B denote 

the event that some tree in R is bad. Prom Lemma 1 , we know that the bad 
event B is unlikely to happen under a random |-coloring . But how do we find 
a good coloring deterministically? The idea is to use the standard technique of 
conditional probability with a pessimistic estimator, introduced by Raghavan 
[ 10 ]. We want to color nodes one by one. The color of each node is chosen to 
minimize the probability of having the bad event B if we randomly | -color 
the remaining nodes. The hope is that the final coloring is a good one because 
the final conditional probability, which is either 0 or 1, is at most the origional 
unconditional one, which is less than 1 . However, it’s not easy to compute the 
exact conditional probability at each step here. So we use a pessimistic estimator 
instead. 

Suppose that we have already assigned colors 71, . . . , 7^ to nodes Vi,. . . ,Vi. 
We will overestimate the conditional probability 

= Pyi+i,...,-f„[B | 7 i,---, 7 i], 
by the following pessimistic estimator: 

Ai(7i, • • - , 7 i) = ^ n ^ -P7i+i.-,7,.[mono(7) | 71, . . . ,7*], 

TSReGT ICe,\I\=c 

where mono( 7 ) denotes the event that 7 is monochromatic. It’s easy to see 
that Pii'ji, . . . , 7i) < ^i(7i, ■ ■ ■ ,7i) for all i and all 71, ... , 7*, and also that 
Aq = from Lemma 1 . Now, 

^i( 7 i , • • • , 7 i) = ^ n ^7i+2,...,7n [mono( 7 ) | 71 , 

T€Re€T /Ce,|/|=c 

= ^^-+1 n ^7i+2....,7.[mono(7) | 71, 

TeR eer KZe,\I\=c 

= . . . , 7 i+i), 

where the second equality is because each edge e in T intersects no other edges 
in T. We pick 7^+1 to minimize ^i(7i, . . . , 7i, 7i-i-i)- Then we have 



• • • , 7i-|-l] 



1 > To > Ti( 7 i) > ^.2(71, 72) > • • • > T„( 7 i, . . . , 7 „) > Pn( 7 i, . . . , 7 n)- 

Pn(7i, • • • , 7n) is either 1 or 0 dependending on whether the coloring 71, . . . , 7„ 
results in a bad ( 2 , 3 )-tree of size u. As Pn(7i, ■ • - ,7n) < 1 , there is no bad 
( 2 , 3 )-tree of size u, and we have found a good |-coloring . 
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It remains to show that for any i and any 71, ... ,7*, ^^(71,... ,7^) can be 
computed efficiently. There are (2,3)-trees of size u in R. It can be shown 

that enumerating all of them takes polynomial time. We omit the proof here. For 
each ( 2 , 3)-tree T and any edge e in T, E/ce,|/|=c ^7.+iv,7n [mono(/) | 71, . . . , 7*] 
also can be easily computed. So ^ 1 ^( 71 ,... , 7 ^) can be computed in deterministic 
polynomial time. 

3 DNF Approximate Counting 

Each finite set is associated with a natural distribution, the uniform distribntion 
over its elements, and we won’t make the distinction between a set and its 
natural distribution when it’s clear from the context. Given a DNF formula F 
on n variables, we’d like to know its volume, vol{F) = = !]• 

Valiant [12] has shown that it’s #P-complete to compute the exact value, so we 
settle for an approximation. The standard approach is to find a pseudorandom 
distribution using many fewer random bits that can still fool F. 

Definition 1 A function g : {0, 1}’’ {0, 1}” is called an e-generator for a 

boolean function F : {0, 1}" — >■ {0, 1} if 

\Pxe{Q,i}pF{x) = I] <^Pye{o,iy[F{9iy)) = 1]| < £■ 

A function g is called an e-generator for a class of boolean functions if it is an 
e-generator for each function in this class. 

So the algorithm for approximating vol{F) is to find an e-generator g for 
F and then compute ^ i}’’ expected value of F over the 

pseudorandom distribution generated by g. The running time is proportional to 
2’’, and the key point is to reduce r. Notice that we can have different function 
g for different F. 

Clearly there are three important parameters that help determine the diffi- 
culty of tfiis problem: the number n of variables, the number m of terms, and 
the error e allowed. We discover the importance of another parameter d, the 
maximum number of terms that a variable can appear. Let DNFrf denote the set 
of DNF formulas with each variable appearing in at most d terms. Such formu- 
las are usually called read-d-times DNF formulas. In the following, we will also 
assume that each term in a formula F contains at most t = log ^ literals. This 
is because we can always remove those terms containing more than t literals to 
get another formula F' such that \Px[F{x) = 1] ^Px[F'{x) = 1]| < e, and then 
consider the formula F' instead. Let ffiNF^ denote the set of DNF^ formulas 
with no term containing more than t literals. This is the class of formulas we 
consider in this section. For convenience, we also assume that n = 

A tDNFd formula F of n variables and m terms can be seen as a hypergraph 
H(V,E) with jVj = n, |F| = m, E C V-* and degree d. We can nse the 
algorithm in the previous section to find a fc-coloring of variables such that no 
c literals are monochromatic in a term, for some k and c to be chosen later. 
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Let Vi, 1 < « < fc, be the set of variables with color i. If we fix values to all 
variables not in Vi, we get a cDNFd formula on Vi. Suppose that for 1 <i <k, 
is an e-generator for all cDNFrf formulas on the variable 

set Vi. Dehne g : {0, 1}’’ {0, 1}", with r = riH \-rk and n= \Vi\-\ l-|Ffe|, 

such that those bits corresponding to Vi are generated by Qi. 

Lemma 2. The function g defined above is a ke-generator for F. 

Proof: For 1 < z < fc, let Ui denote the uniform distribution over {0,1}!^’ I for 
the variables in Vi, and let Si be the corresponding pseudorandom distribution 
generated by gi. Let Di denote the distribution 5i x • • • x 5i x Ui+i x ■ ■ ■ x Uk, 
and let D[ denote the distribution 5i x • • • x 5i_i x Ui+i x ■ ■ ■ xUk- For y £ D[ 
let Fy denote the resulting formula from F by assigning the value y to the 
corresponding variables. Fy is a cDNFrf formula on variable set Vi. Then 

\vol(F) = 1 ]| = \Px€Do[F(x) = 1 ] ■i^PxeDdFix) = 1 ]| 

k-1 

< E \P-^d,[F{x) = 1] = 1]| 

i=0 

k-1 

S Fy€D'-\Pz€Si[Fy{z) = 1] “^Pz^UilFyi^) = 1]| 
2=0 

< fee. 

Dk is the pseudorandom distribution generated by g. So 5 is a fee-generator for 
F. □ 

It remains to find such e-generators for cDNF^. We will give two constructions 
according to two different values of fe and c. 

3.1 Construction I, c = 0(log and k = 0(|) 

For a DNF formula G, a subformula of G is a formula with some of G’s terms 
removed. Let I — 2‘^ln^ = and m' = d(l 44>I)c = yve will see 

that any cDNFrf formula G has a subformula of at most m' terms with almost 
the same volume. This suggests the following lemma. 

Lemma 3. Suppose that G is a cDNFj, formula and g is an e-generator for all 
subformulas of G with at most m! terms. Then g is a 2e-generator for G. 

Proof: When G, after simplihcation, has at most m' terms, g is certainly an e- 
generator for G. When G has more than m' terms, it has I disjoint terms because 
otherwise some variable would appear more than m' /{{1 44>I)c) = d times. Let T 
denote the OR of those I disjoint terms. Then 



I > vol(G) > vol(T) > 1 <^(1 > 1 ^ 1 
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As g is an e- generator for T, we have 



1 > Py[G{giy)) = 1] > Py[T{g{y)) = 1] > vol{T) > 1 ^2e. 



Then \vol{G) ^Py[G(giy)) = 1]| < 2e. □ 

It remains to show how to find such an e-generator for all subformulas with at 
most m' terms from any G G cDNFrf. We will show that the generator of Luby, 
Velickovic, and Wigderson [7] can be slightly modified to suit this purpose. For 
more detail, please refer to [7]. 

First, we fix the following parameters: 



— n' = cm' — 



b = log 



4nf m' 



= 0(logf). 



- s = h^= 0(log2 f ), 

- r = 24cb^ = 0(log^ 

- S = S ^ ^ ('l')o(log^ f ) 

4m' n' {3cr)‘^^ V 2 / 



Following [7], we want to construct a set system. For any subformula T of of 
G with m' terms, we call a family of n subsets Si,. . . ,Sn C {1, . . . , r} good for 
T if they satisfy the following two conditions: 



— for any variable Xi of T, |5i| < s, and 

— for any variable Xi of T and any term of T with variables {xj : j G B}, 
\SiniU^eB\{^}SJ)\<b. 

We will choose them randomly from an approximate 26-wise independent space. 



Definition 2 An (n,k,p,6) space consists of a sequence of n binary random 
variables Xi,. . Xn, such that for any I C {1, . . . , n} with |7| < k, 

|P[Vi G I, Xi= 1] < 6. 

Let yij, for 1 < z < n and 1 < i < r, be sampled from an {nr, 26, 6) space. 

It can be efficiently sampled using v = 0(log log n -I- 6 log ^ -I- log = 0(log^ 
random bits [3]. Let Si{y) = {j : yij = 1}. Let T be any subformula of G with at 
most m' terms and n' variables. Then, by choosing y randomly, the probability 
that Si(y),. . . , Sn{y) are not good for T is at most 



n'{\ + + mV c ^((— + 6) 



< —r + m n { 



, , , 12cs^ 






, f+n'r‘^’^5 + m'n'{^f5 
rb 6 



^ e e € € 

S — + — + - + — 

4 4 4 4 

< e, 



Consider the generator g : {0, 1}’’+*' {0, 1}”, defined as the following: 



9{w,y) = { 0 Wj,..., 0 Wj). 

j^Si(y) j€S„{y) 
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The proof in [7] can be used to show that g is an e-generator for all subfomulas 
of G with at most m' terms. Prom Lemma 3, is a 2e-generator for G G cDNFd. 
Prom Lemma 2, we can get a 2A;e-generator for F G DNP^. The number of 
random bits used is 

k(r + v) — 0{- log — ) = 0(log — log^ ). 

c e e e 

With e replaced by ^ , we have the following: 

Lemma 4. Given F G DNF^, we can construct an e-generator for F using 
0(log Y "^ ) random bits. 

To summarize, given a DNP formula F, we do the following: 

— Remove those terms with more than t = log ^ variables. 

— Determine the parameter d, the maximum number of appearances of any 
variable. 

— Run the hypergraph coloring algorithm to partition F into k cDNFd formu- 
las. 

— Construct generators gi, - ■ ■ ,Qk and the generator g. 

— Compute the average of F under the pseudorandom distribution generated 
by g- 

So we have the following theorem. 

Theorem 1 Given a DNF formula F , we can approximate its volume with error 
e in deterministic 2 ‘^b°g t ) time. 



3.2 Construction II, k = and c = 0(y/log(dt)) 

This is the framework used by Luby and Velickovic [6] , but we use our hypergraph 
coloring algorithm instead. Let I = [log ^]c2'^ and S = Let h : {0, 1}’’ — > 
{0, 1}" be the mapping that generates the {n, 1, 1/2, d) space. Then 

Lemma 5. [6] h is an ^-generator for cDNF. 

Prom Lemma 2, we have an e-generator using kr = 0{k{l log log n-l- log 1/S)) — 
0{kl) random bits. So we have the following: 



Lemma 6. Given F G DNFd, we can construct an e-generator for F using 
log Y log random bits. 

Similarly, we have the following theorem. 



Theorem 2 Given a DNF formula F, we can approximate its volume with error 

I ™ I loO(l/>°8(41og ^)) 

e in deterministic 2*°® <= ^ time. 
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4 Dimensions of Posets 

In this section, we will constructivize existential bounds, given by Fiiredi and 
Kahn [5], on the dimensions of posets. 

Definition 3 [5] Let (P,<) be a poset. Its dimension, denoted as dim{P), is 
defined to be the minimum number of permutations Li,. . . ,Ld such that P = 
Li n ■ ■ ■ D Ld, i.e., 

^x,y G P, X < y 4^^i,x <Li y- (2) 

Definition 4 [5j For x £ P, define U{x) = {y G: P : y > x}, L{x) = {y G 
P : y < a;}, and C(x) = U{x) U L{x). Define u = max{?7(a;) : x G P}, I = 
max{P(x) : x G P}, and t = max.{C{x) : x G P}. 

Lemma 7. [5] The dimension of (P, <) is equal to the minimum number of 
permutations tti, . . . , tt^, such that 

^x,yeP, y^x^3i,x<„,U(y). (3) 

In addition, there is a deterministic polynomial time algorithm for converting a 
set ofd permutations satifying condition (3) to a set of d permutations satifying 
condition (2). 

Fiiredi and Kahn [5] gave a simple upper bound: dim(P) = 0(wlog|P|). 
Their argument can be turned into a deterministic algorithm. 

Theorem 3 Given any poset (P,<), we can find a set of 0(u\og\P\) permuta- 
tions satisfying condition (2) in deterministic polynomial time. 

Proof: We say that a pair (x, y) G P^ is killed by a permutation tt if x < 7 ^ P (y). 
We want to pick d = 0(wlog|P|) permutations one by one, in d phases. In 
phase i, find tt* that can kill at least fraction of those pairs not killed by 
7 Ti, . . . , TTj-i. Such a 7 Tj exists because the expected fraction killed by a random 
permutation is We can find such a iTi by fixing components one by one, in |P| 
steps, again using the technique of conditional probability. Then after d phases, 
the number of pairs not killed yet is less than |Pp(l44^^)‘^ < 22 '°gl-P|-P(“+i) < 
1, for some d = 0{u\og |P|). These d permutations satisfy condition (3) and can 
be converted to d permutations satisfying condition (2). It’s easy to see that the 
whole process can be done in deterministic polynomial time. □ 

Fiiredi and Kahn [5] gave another upper bound: dim(P) = 0(f log^ t). Again, 
their argument can be turned into a deterministic algorithm. Assume without 
loss of generality that < |P| (otherwise, we can just use the previous theo- 
rem). 

First, a simple lemma. 

Lemma 8. [5j Given a hypergraph H{V,E) with E C and degree at most 
b, there exists a coloring with (a44>l)6 -|- 1 colors such that no color appears more 
than once in an edge. 
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Such a coloring can easily be found using a greedy algorithm. 

Theorem 4 Given any poset (P, <), we can find a set of 0{t\og^ t) permuta- 
tions satisfying condition (2) in deterministic polynomial time. 



Proof: Consider the hypergraph H{V,E) where V = P and E = {U{y) : y G 
P} C V-^, with degree at most t. Using our hypergraph coloring algorithm, 
we can color nodes using k = 0{t/c) colors such that no color appears more 
than c = 0(logt) times in an edge. H is now partitioned into k hypergraphs 
Hi{Vi,Ei), . . . ,Hk{Vk,Ek) in the obvious way. 

We will have k groups, Gi, . . . ,Gk, of permutations, with permutations in 
the group Gi designed to kill those pairs {x,y) G P^ with x G Vi. Permutations 
in Gi place U ahead of U \ U, and use an arbitrary permutation Tj for V \ Vi. 
It remains to guarantee that for each {x, y) G P^ with x G Vi and y ^ x, there 
exists a permutation w G Gi such that x U{y) fl U- Notice that we have 
reduced the original problem to k subproblems. 

For Hi, use Lemma 8 to color nodes in U* with r = 0{ct) — 0{t\ogt) colors 
such that all colors in an edge are distinct. Let Vij, 1 < j < r, denote those 
nodes in Vi with color j. Notice that the order among Vij does not matter, and 
we fix an arbitrary permutation on Vij, together with its converse Rfj. It 
remains to find a set of permutations on r colors such that for any set 5 of c 44>1 
colors and any color a ^ S, some permutation puts a ahead of S. A random 
collection of s = 0{(fi logr) permutations will fail with probability at most 



r 44>1 

C 44>1 



( 14 ^-)* < 1 . 



Using a similar idea to that in Theorem 3, a good set of permutations 7i, ■ ■ ■ , 7« 
can be found one by one, each in deterministic time. Then for 

each i and I with 1 <i <k and 1 < Z < s, we define two permutations: 



7 (2) 7 • • ■ 7 .^i,7;(r) 7 7 and 

~ (-^i,7;(l)7 -^i,7;(2) 7 • • • 7 -^i,7;()')7 ^j)' 

So the total number of permutations is 2sk = 0(t log^ t), and they can be found 
in deterministic polynomial time. □ 

Note that a more careful analysis shows that actually we can find a set of 
0(tlognlog/) of permutations satisfying condition (2). 
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Abstract. We construct a pseudo-random generator for space bounded 
computations using the extractor of Zuckerman [7]. For machines that 
use S space and R <2 random bits for e > 0, the generator uses a 
seed of length 0((51og _R)/log S) which is shorter than the seed of both 
the generator of Nisan [4] and the generator of Nisan and Zuckerman 
[5]. We then use this generator to derandomize these machines in space 
0(Sy^(iog'R)7iog^) which is better than the derandomization of [6]. 



1 Introduction 

One of the important resources that complexity theory tries to measure besides 
time and space is the number of random bits needed to solve a given problem 
using a certain probabilistic model. Upper bounds are usually gained by intro- 
ducing a deterministic algorithm G that gets a short random input s and outputs 
a long string of length R ^ |s| that fools the model. By fooling a probabilistic 
computational model we mean that its behavior on a truly random input of 
length R is statistically very close to its behavior on the output of G when s 
is truly random. The algorithm G is called pseudo-random generator for that 
model. 

The computational model that we try to fool here is space-bounded computa- 
tions, i.e., the complexity class BPSPACE(S) which contains all the problems 
solvable by a Turing machine in space S using random bits with a bounded 
two sided error. The two best pseudo-random generators known today for space- 
bounded computations are [4] and [5]. In [4] Nisan shows a generator that fools 
any space S machine that requires R < random bits using a seed of 

length 0(S\ogR). However, when R = Nisan and Zuckerman [5] intro- 

duce a more powerful generator which requires only 0(S) random bits for its 
seed. Our generator combines ideas from the two generators of [5] and [3], which 
almost interpolates the whole gap between the two results above. For the rest 
of the paper denote e(z) = 4*°8 ^ logz. We show that for R < there ex- 

ists a pseudo-random generator for space-S” computations with seed of length 
^( max{i'iog 5 -/ogi^gfl} ) statistical error This result is better than 

both earlier results for R such that < R < 2^^ ' for e > 0. For R = , 

M. Luby, J. Rolim, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 47-59, 1998. 
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the generator of Nisaii and Zuckerman nses a seed of length while our 

generator G uses a seed of length 0{kS), and for R < 2^ for some e > 0, the 
length of the seed is only while the generator of Nisan needs a seed of 

length ©(Slogi?). Note also, that for < R < our generator gives 

an estimation to the accepting probability of a BPSPACE(S) machine in less 
space than [6], since they use ©(S'^logi?) space which is larger than our seed. 

We would like to remark here that ideally, the ’’error term” e(R) should sim- 
ply be logi?, but unfortunately this e(R) term arises in the known constructions 
of extractors, and no better one is known yet. 

The best derandomization known so far for BPSPACE(S) is the one of 
Saks and Zhou [6]. They proved that any problem that is randomly solvable in 
space S using R < 2^^^') random bits, can be solved deterministically in space 
0{S\/logR) as well. In this paper we show that this task can be done in space 

0( I t Por R < 2'®^ ' for some e > 0, the space required 

^ ^max{l,logS-loglogi?} ^ ^ ^ 

according to our result is only 0(5'^/^^) which is less by a factor of ^log S 
than the space used by Saks and Zhou [6]. 

2 Preliminaries 

2.1 Definitions and Notations 

Let a; G be a vector, then its ith entry is denoted by x[i]. The Ti-norm 
of X is ||a;|| = k[*]|- Let M be a / x / matrix over R, then its Li-norm 

is ||M|| = sup{a;Af : x G R^,||a;|| = 1}. The statistical distance between two 
distributions D and D' on the space {0, 1}^ is \\D — D'\\. 

Let A\ and A 2 be two events in a probability space. We denote by Pr[Ti : A 2 ] 
the probability of Ai conditioned upon A 2 . Let f : W ^ Rhe a function, and 
let X be a random variable distributed on W, then Ej,gx[/(-K)] denotes the 
expectation of /(X). We denote by D/ the distribution over R of the random 
variable f(Y) where Y is a random variable uniformly distributed over W. If 
f :W ^ R and g : U ^ W, then f og is the composition of / and g, thus, Dfcg 
is the distribution over R of the random variable f(g{Y)) where Y is a random 
variable uniformly distributed over U . 



2.2 Extractors 

Definition 1. [5] A distribution D on lO, iV is called a 6-source if for all x G 

{0,iy, D{x)<2-^K 

Definition 2. [7, Definition I. 4 ] A function E : {0,1}* x {0,1}’* ^ {0,1}"* 
is called a (k, 6,t, m, e)-extractor if for every 6-source D on {0,1}*, if X is 
a random variable distributed according to D and Y uniformly distributed on 
{0, 1}**, the distribution of E(X,Y)oY is within statistical distance ofe from the 
uniform distribution on {0, 1}"*+*. 
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Lemma 1. There is a eonsiani ci > 0 such that for every integer k and every 
e > 2 Hog* fe there exists a {k, c\ log 4 , e)- extractor which runs in NC. 

For the proof, refer to [7, Theorem 3. 13, Remark 3.14] and substitute n = k, 6 = ^ 
and a = 5. Note that although the original proof uses a different definition for 
statistical distance between distributions, which differs from ours by a factor of 
2, the Lemma still holds. 



2.3 Deterministic Sampling 

Definition 3. An (e, 7) sampler (oblivious sampler) A : {0, 1}™ x W-{o,i}' 
IS a deterministic algorithm such that for every function f : {0, 1}^ ^ [0, 1], 

Pr [|Eie[fc][/(^(y. f))] - E,,g{o,i}*[/(a;)]| > e] < 7 

ye{0,lj™ 

An efficient construction of a (non-oblivious) sampler is presented in [2] . For 
the proof of the following Lemma, refer to [7, Theorem 5.5] and substitute d = k, 
a = i and 7 = 2“h 

Lemma 2. For every integer I, for every constant 6 < 1 and for every e > 2“^*’ 
there is an (e, 2“^) oblivious sampler A : {0, 1}^^ X [A;] ^ {0, 1}^, running in NC, 
where k = 

2.4 Space Bounded Computations and Branching Programs 

In this subsection we show how to simulate a space bounded computation by 
branching programs (BPs) in order to conclude that a pseudo-random generator 
for BPs fools space bounded machines as well. For that purpose we need the 
following definition of oblivious read-once BP. 

Definition 4. A (w,n,r)-BP (branching program) is a graph of n + I layers, 
indexed by 0,...,n, with w vertices at each layer (length n and width w). For 
each vertex V at layer i < n there are 2’’ outgoing multi-edges to vertices at layer 
i-\-\ where each of these edges is labeled by a distinct string from {0, 1}'’, Upon 
receiving an input string from {0, 1}’’'* which can be viewed as n blocks of length 
r, denoted Ri , . . ., R^, all edges are deleted from the BP, except for the edges 
between layer i — 1 and i labeled by Ri, for every i < n. Let s be one of the 
vertices at layer 0 - we call s the initial vertex. The value computed by the BP 
IS the index (in \w\) of the last vertex in the path that was started from s. Thus, 
a (w, n, r)-BP can be viewed as a function from {0, 1}’’" to [u>]. 

Now, we define a pseudo-random generator for BPs. 

Definition 5. A function G : {0, 1}^ ^ {0, 1}’’" is called a "(-pseudo-random 
generator for {w, n, r)-BPs if for every {w, n, r)-BP P, \\Dp — Dpoc\\ < 7- 
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Definition 6 . A function G : {0, 1}^ ^ {0, 1}^ is called a ■j-pseudo-random 
generator for space-S machmes that use R random bits, if for every machine 
M that runs in space S, uses R random bits and outputs {accept, reject] , then 

\\Dm - Dmog\\ < 7- 

Lemma 3. Let S,R> 0 be integers and let a > 0. Then, ifG : {0, 1}^ ^ {0, 1}^ 
is an a-pseudo-random generator for (2^ , R,l)-BPs, then G is an a-pseudo- 
random generator for space-S machines that use R random bits. 

Proof: Let M be a space-5 machine that uses R random bits. Following [3], we 
simulate M by a (2‘^ , R, 1)-BP P. At each layer of P, the 2‘^ different vertices 
will correspond to the different configurations of M. An edge from vertex v at 
layer i, labeled by a; G { 0 , 1 } will end up at vertex u at layer i + 1 iff the input x 
on the random tape of M takes it from configuration v to u. Let s be the initial 
configuration of M. It follows that P computes the ending configuration of P. 
Now, since the output of Af is a mapping from configurations to {accept, reject], 
we get that for every G : { 0 , 1 }' ^ { 0 , 1 }^, \\Dm - Dmog\\ < \\Dp - Dpog\\- We 
conclude that if G is a-pseudo-random for (2‘^, R, l)-BPs, then G is a-pseudo- 
random for space-5 machines that use at most R random bits. □ 

Remark 1. If G : {0, 1}^ ^ {0, 1}’’" is 7 -pseudo-random generator for {w, n, r)- 
BPs, and if for some r' < r, G' : {0, 1}^ ^ {0, 1 }’’'" is defined to output the 
first r' bits of every output block of G, then G' is 7 -pseudo-random generator 
for (w, n, r')-BPs. 

3 Pseudo-randomness 

In this section we show a pseudo-random generator that fools any (?n,n,r)-BP 
P . Our building block will be the extractor from sub-section 2.2. 



3.1 Pseudo-randomness for Branching Programs 

Definition 7. Let c\ be the constant of Lemma 1. Given n, w, r and 7 , fix 
the following parameters: Let e = k = max{4r, 2 logic -|- e( j)} (we remind 
the reader that e(z) = 4 ’°s ^ log ^ J and t ‘^= ci log Let E be the (k, \,t,r, e)- 
extractor of Lemma 1. Define a generator G : {0, 1}*+”* ^ {0, 1}"’’ as follows: 

yxe {0,l]’‘,yi,...,yn e {0,1}*, G(x,yi,...,yn)=^ E(x,yi), . . . , E(x,yn) 

For every t/i,...,t/n G {0,1}*, we denote G^^u. -.yn) ; {0,1}*^ ^ {0,1}"’’ as 

Qiyi,.:,yG(x] = G{x, t/i, . . . , j/„). 

Lemma 4. For every n, w, r and 7 , for every (w,n,r)-BP P and for every 
random variableY uniformly distributed on {0, 1}*", Ey €Y[\\Dp-Dp^^^y■)\\] < 7 , 
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A similar version of this Lemma lias already been proven by [5, Lemma 2] for 
space-bonnded machine, and we state our Lemma for BPs. Another difference 
is that their Lemma is weaker, since they only proved that 

\\^y^Y[Dp — -DpoG(!,)]|| = \\Dp — -Dpocll < 7 

Proof: Let _P be a {w, n, r)-BP. We denote by Si{u, v) C {0, 1}'’ the set of labels 
of edges going from vertex u at layer i — 1 to vertex v at layer i of P. We define 
the following w x w stochastic matrices. Mi is the matrix with at its 

(u, v) coordinate and for every x G {0, 1}^ and y G {0, 1}* we define the matrix 
Mi(x, y) with 1 at its (u, d) coordinate if E{x, y) G Si{u, v) and 0 otherwise. Let 
uo be the initial state of P, then let e be a unit vector of length w with 1 at its 
Uo coordinate and 0 elsewhere. 

Let A be a random variable uniformly distributed on {0, 1}^. Notice that 
Dp = en"=i Mi and that for every j/i, . . . , C {0, 1}*, 

n 

DpoQ<,yi, -;Vn) — Mi(x, J/j)] 

i—1 

Let yi,...,Y„ be random variables uniformly distributed on {0,l}h We will 
show by induction on j that 

j j 

^yi^Yi,...,yj^Yj[\\s{(^Mi) — ^ 1 ( 2 :, 2/i)])||] < — 

2=1 2=1 

For j = 0 we trivially get equality. For j > 0, assume correctness for j — 1 and 
prove for j. 

j j 

^yi€Yu:;y,€Yj M([[Mi) - E,^x[l[Mi(x, i/i)])||] < 

i=l 2=1 

i-1 j-i 

< '^yi€Yi,..^,yj€Yj[\\e{(Y[ Mi) - Mi{x ,yi)])Mj\\] + 

2=1 2 = 1 

j-i 

~^^yi€Yi,.^.,yj€Yj[\\sE3;^x[(Y\_Mi(x,yi))(Mj — 2/i))]||] 

2 = 1 

< ^ b ^yi€Yi,.. ,yj€Yj[\\Exex[<^{x, J/i, . . . , yj-i){Mj — Mj(x, J/j))]||] 

where in the last inequality, the first summand is bounded by the induction 
hypothesis and ||Afj|| = 1 as it is stochastic, and in the second summand, 
v(x,yi,. . .,yj-i) = enCi Mi{x, yi). Note that v{x, y\, . . . , yj-i) is a distribu- 
tion vector over [w]. To conclude the proof, it is enough to show that for every 
yi,...,yj-i G {0,1}1 

^yj£Yi[\\Ex^x[v{x,yi, . . . ,yj-i){Mj - Mj(x,yj))]\\] < J 
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For the rest of the proof we fix the values of j/i, . . . , 

Let V be a random variable distributed on [w] according to the distribution 
t/i, . . . , %-i)]. Let L = {v^[w]: Pr[V = t)] > 

Fort; G L, Pr[X = x:V = v]< < ^■Pt[X = x] < 22-Pr[X = ®], We 

get that the distribution of X conditioned upon V = w is a ^-source. Since E is 
an extractor, we get that for a random variable Z which is uniformly distributed 
on {0,ir, 



E E |Prp = ^)A(Y,-=%-)]- 

Pi[{E{X, Yj) =z)A (Yj = yj):V = v]\<e 



thus, 

Ey,^Y,[\\^^^x[v{x, yi, . - Mj{x, %-))]||] = 

= Ey,GY,[||E:,:6x[ E •] - %■)[«> •])]!!] 

< Pr[V = i>] 

v€L 

E I e “ ^^[E{X, y^) G S,{v, v’) -.V = «]|] 

v' £\w\ 

+ ^ Pr[v = v\Ey^^Ym-^x[{Mi [v, .] - M^ix, y^)[v, .])]||] 

vgL 

< Pr[V = i>] • e + Pr[V = v] • 1 < e + — < 2e < — 

v£L v^L v^L 



□ 

Theorem 1. There exist eonstants 02,03 such that for every n,w,r and 7, if 
k = C 2 (r + logw + e(^)) and t = Cslog then G : {0, 1}^+"* ^ {0, 1}"'’ is 

a 'f -pseudo-random generator for (w, n, r)-BPs that runs in NC. 

Proof: Let C 2 = 4 and 03 = 2ci. Let Y be a random variable uniformly dis- 
tributed on {0, 1}”*. By Lemma 4, for every (w, n, r)-BP P 

\\DP 

^PoG II = ||_Dp - Ej,gy[Dp^(5(„)]|| < Ej,gy[||Dn - 

^PoG(y) ll] — T 

Since each of the output bits of G is simply computed by applying an ex- 
tractor to some predefined subset of the input bits, and since the extractor runs 
in NG, then each of the output bits of G is computed in NG as well. □ 

3.2 Composing Generators for Branching Programs 

In this subsection, we will see that the generator G that was defined in subsection 
3.1 can be composed on itself. To this end, we need the following proposition; 




On the De-randomization of Space Bounded Computations 



53 



Proposition 1. Lei P he a (w,n,r)-BP and let G : {0,1}*+”* ^ {O,!}”” be 
the pseudo-random generator for (w,n,r)-BPs of Theorem 1. Then, for every 
X G jO, 1}*, 

P^^\yi,...,Vn) = P{G{x,yu...,yn)) 

is a (w, n,t)-BP. 

Proof: Note that has the same vertex set as P has, but its edge set is 
smaller. Every edge from layer * — 1 to layer i of P with label from E(x, yi) for 
some value of yi G jO, 1}* is labeled in P^^'> by yi while all other edges of P do 
not appear in P^^'t . Hence, P^^'> is a {w,n,t)-BP. □ 

In order to be able to compose the generator, we would need to view the BP 
as a (w, n' , r)-BP for some n' . In the following definition we formally show 
how we do so. 

Definition 8. Let Q be a (w,n,t)-BP. We define the operation collapse-c, the 
result of which is a (w, \-~\,tc)-BP, Q, as follows. Denote n' {-], then Q has 
n' + 1 layers indexed by 0, . . n' , eaeh of whieh eontains w vertices. 

For every i < n' , for every u,v ^ [re], the edge between u at layer i and v at 
layer i -\- 1 exists in Q if and only if there exists a path in Q between vertex u at 
layer ic and vertex v at layer (z+ l)c (in the case that (i + l)c > n, we look for a 
path ending at vertex v at layer n, which is of course of length .shorter than c). 
The label of the edge {u, v) in Q is the concatenation of the labels on the edges 
in the corresponding path in Q. In the case that the length of the paths to the 
last layer in Q are of length c' shorter than c, the edges in Q corresponding to 
these paths have labels of length only c't. In this case we lengthen these labels by 
concatenating to eaeh of them, all the possible suffixes from which 

results in replaeing each edge by )* parallel edges. 

Assuming that t < r, we would like to perform the operation collapse- [j] on 
and get a new BP, P^^\ with labels of length at least r. This fact allows 
us to feed the output of G instead of a real random string to P^^\ For the rest 
of this subsection, we will properly define the composition of G and prove that 
this composition really gives us a pseudo-random generator for BPs. 

Definition 9. Let n, w, r and a be given. Let 7 1=^ and let k = C2(r-|-log w-\- 
e(^)). Let t = C3 log ^ = tmax{2, 7) ] | and k' = 2c2r'. 

Let h 1= { ], let no be n and for i > 0, let m For i > 0, we 

T “ 

define generators Gi : jO, 1}* x {0, x jO, l}"'-i'* ^ {0, !}”■'’ recursively 

as follows: 

Gi{xi, r/i, . . . , j/„) liF G{xi,yi , . . . , t/„) 

and for i > 1 

Gj (xi, . . . , J/i , . . . , = Gj_ 1(2:1 ,..., ®i_i, G(Xi , J/l, • • . , J/ni_i)) 
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where we think of each of the rii-i output Mocks of G of length r' as partitioned 
into Y Mocks of length t, which gives a total of rii -2 blocks from {0,1}* each. 
Now, for every x\ G |0, 1}*^, every X 2 , • • • , G jO, 1}*^ and for every y G jO, 1}* 
we define 

G{xi, .. .,Xh,y) = Gh{xi, . . .,Xh,y) 

Theorem 2. G is an a-pseudo-random generator for (w, n, r)-BPs that expands 
a seed of length I = 0 (r+logn ^^^^^ ) and tt runs in space 0(1). 

Proof: The length of the seed that G uses is + k'(h — 1) + i = 0(r + k'h). 



k'h = 0(k‘ 



/ logn 



= 0 (- 



r' log n 



= 0 (logn 



logy* ""^max|l,Ioglogit; — logi) 
log rw + 4 *°s a log ^ 



maxj 1 , log log w — log log ^ ^ 



thus, the total length of the seed is 0(r + log u i /og log w -log log ^ } ) • 

each of the output bits of G is computed in NG, and G is a composition of G 
at most h times, then G does not use more than k'h = 0(1) space. 

Let P be a (w, n, r)-BP with n and r that satisfy the conditions in Definition 
9. We prove by induction on i that \\Dp — DpoCiW G *7- 

For i = 1 the claim follows immediately from the fact that by the choice of k 
and t and by Theorem 1, G is a 7 -pseudo-random generator for (w,n,r)-BPs. 
Assume correctness for i — \ and prove for i > 1. 



\\Dp — DpoG.W < II + —DpaG^W 

By the induction hypothesis, the first summand is bounded by (i — 1)7 so we 
are left to bound the second. Let x\ G {0,1}*, X 2 ,...,Xi-i G {0,1}* and 
yi, • • • , yn,_i G { 0 , 1 }* we define 



• • • , yn._i) P(Gi-i(xi, ..., ®i_i, yi, . . . , y„,_i)) 

Let Ai be a random variable uniformly distributed over {0, 1}* and let 
X 2 , . . . , Xi-\ be random variables uniformly distributed over {0, 1}* . Then, 

PpoGi-i — [• • • . ■ ■ •] 

DpoGi = • ■'E‘x,-i€X,-iWp(^i’ '-=^i-i)oG\ ' ' J 

we get that 

||-D_poG._i — f5poG.ll < 

where the last inequality is correct since is a (w, «*■_!, t)-BP and 

by performing ^-collapse we get a (w, rii, r*)-BP and by the choice of k' and t 
and by Theorem 1, G is a 7 -pseudo-random generator for (w, rii, r')-BPs. We 
conclude that \\Dp — Dpog^W < hj < q □ 
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Corollary 1. There is an a-pseudo-random generator G : {0, 1}^ ^ {0, 1}^ for 
space-S machines that use R random bits, where G uses a seed of length 
1 — n( (S+e(^))\ogR X 
^ max{l,log5— log log 

Proof: Use Theorem 2 and Lemma 3. □ 



4 Derandomization of Space Bounded Computations 

In this section we show how to derandomize space bounded computations using 
our generator. We will use the method of [6], but instead of employing the gen- 
erator of [4], we are taking the generator we constructed in the previous section. 
Unfortunately, this is not a straight forward task, since a close investigation of 
[6] reveals the fact that they use a very unique property of Nisan’s generator 
that our generator doesn’t have. This property allows one to fix most of the 
bits in the seed of the generator and yet, by exhausting all possibilities of the 
remaining bits, to get a good estimation on the probability of moving from one 
configuration to another in the machine (or in the branching program). Our first 
task here is to show that every generator can be augmented with this property 
and only then, we proceed to use our generator in [6]. To this end, we will use 
samplers, such as the oblivious sampler of [7]. 

A second use for the sampler is to decrease the error. A key idea in this paper 
is to have the error of the generator be 1 over polynomial in the length of the 
BP. However, when we carry out the perturbation and truncation technique of 
[6], we introduce an error which is factored by a polynomial in the width of the 
BP. This factor cannot be tolerated, so we need to somehow decrease the error. 
Again, we use the (very same) sampler in a similar way to [1]. 

Lemma 5. Let 6 < 1. Let G : {0, 1}^ ^ ({0, 1}'’)" he an a-pseudo-random 
generator for (w,n,r)-BPs that runs m spaces. For every (3 > w2~^ , there 
exists an algorithm Q : {0, 1}^^ x [A;] ^ ({0, 1}’")" where k = Q runs in 

space s + polyXogl and if for every y G {0, we define the funetion 
Gy '■ M ^ ({0> 1}'’)" h Gy(i) = G{y, i), then for every (w, n, r) BP P, 

Pr ]\\DpoQy — Dpog\\> (3] < w2~^ 

y€{o,i}3' 

Proof: Let A : {0,1}^^ x [^] ^ {0)1}^ be the (£,2“^) oblivious sampler of 
Lemma 2. Define Q = G o A. Let P be a {w, n, r) BP. For every j G [li;], define 
the function fj : {0, 1}^ ^ {0, 1} as fj(x) = 1 if and only if P{G{x)) = j. We 
get that for every y G {0, 1}^ 

\\Dpog,-Dp,a\\= E I *))) = i] ~ Pf\P(G(x)) = j]\ 

j€[w] 
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and by the fact that is a (£,2 sampler we conclude that 

- \^ielk][fjiMy,i))]-^xe{o,iy[fj(x)]\>P] 

,-6H 

^ E Pr 3 [|E,eM[/i(^(j/, 0)] - E,e{0,ih[/t (*)]| > ^] < «^2-' 

JG [«'1 

□ 



Corollary 2. Lei b < 1. There exists an algorithm Q sueh that for every w, n, r, 
a and (3 > 2“''’ where I = 0(r + log n maxfi.iog^iogE^ioiiog )’ ^ 
y of length |j/| = I and an input i of length O(log^), Q runs in space 0(1) and 

outputs n bloeks of length r such that with probability at least 1 — w2~^ over 
y G {0,1}^, = 0(2/)') ** an a + (3 pseudo-random generator for (w,n,r) 

BPs. 

Proof: Combine Theorem 2 and Lemma 5 using for every y G {0, 1}^ 

\\DpoOy — Dp\\ < \\Dpoo^ — DpooW + \\DpoG — Dp\\ 



□ 

Now we can start following the proof of [6], while replacing Nisan’s generator 
by ours. For the rest of this paper, we assume that the reader is familiar with 
the details of [6]. The first step will be to redefine their PRS algorithm, which 
will now use our generator Q. The following Lemma replaces [6, Lemma 4.1]. 

Lemma 6. Let b < 1. Given as input a w x w sub-stochastic matrix M, an 
a > 0 and integers i and K, set r = t — logo and I = 0(t i °o^iog^i’~liog r } ) > 
then if K < then the algorithm PRS(M,t,r',y) runs in space 0(r), takes a 
random string y G {0, 1}^ and computes a sub-stoehastie matrix of dimension 
w which approximates with accuracy K — logo and error probability 

w2~i . 

Proof: We basically follow the proof of [6, Lemma 4.1], employing corollary 2 
and noticing that the PRS algorithm does not use more than 0(r) space other 
than the space needed to compute (?. However, although PRS will be part of a 
recursive algorithm, the (J algorithm does not participate in the recursion, thus, 
it uses the same 0(1) extra space each time it is invoked (even in different levels 
of the recursion). Moreover, while using corollary 2 with j3 = 2~^ and n = 2*, 
the error probability over the y G {0, 1}^ of using Q is w2~^ . □ 

Now, we let the algorithm MAIN of [6] work normally, only using our new 
version of the PRS algorithm which uses shorter random input. To fully utilize 
this extra power, we fix the parameters somewhat differently. Given an a > 
0 and integers w and t we compute the following parameters: r = t — log a. 
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K = 13(r + logw), D = 7(r + logw), d = K — D as in [6], and we set = 
max{l,loglogw; — logr}], t'2 = [^1 (for simplicity assume in the sequel 

that t = tih) and / = 0(h max{iSogi-iogr} )■ The algorithm MAIN gets as 
input a sub-stochastic w x w matrix M and random inputs y G {0, 1}^ and 
qi, ... ,qt2 C {0, l}"^ and computes a sequence of matrices, where for every i G 

[hi 

Mi= [Es^{PRS(Mi_i,ti,r;y))\d 
Mo = M and Si = qi2~^ . 

Theorem 3. The algorithm MAIN approximates Ah~>(^M) with L\ distance 
2^*(a-|-2tr>2“'^) and error probability t2{w2~ 3 + 2w'^2~^) and it does so in space 

0(1 + Kt2). 

Proof: For a w x w matrix TV, an algorithm g : {0, 1}^ ^ ({0, 1}'’)^*^ and a 
y G {0, 1}^, define the operation Sg(N\y) as the w x w matrix that in its i,j 
entry contains 1 if we can move from i to j in 2^^ steps in the (w, 2*^, r) machine 
associated with TV, where the ’’instructions” are the 2*^ blocks of length r oig(y). 
Note that P RS(N ,ti,r\y) = Ejg[j,][S'pj,(TV; j)]. Define the following sequences 
of matrices for every i G [^ 2 ]: 

TV, = [iT^.(EjS'G(TV,_i;y)])Jrf 

Mi(y) = [E,X^i^^k][Sg,(Mi.i(y)-j)])U 



where TVq = Mo(y) = M. Now, since G a an a-pseudo-random generator for 
(w, 2*1, r)-BPs and since the operations [-Jj and IT(') introduce an error of at 
most each, we get that 

||7V, < ||Li:,,(E,[‘TG(TVe_i;y)])Jd - (Ni-if^\\ 

+||(TVi_i)2‘* -(A«*-i)*i)(Thr))2‘*|| 

< a -f 2 • w2~‘^ + 2*^2h^-'^'>*\a + 2 ■ w2~^) 

< 2^^*^(a+2-w2-‘‘) 

We will now prove by induction on i that with high probability over the random 
choices of y and 61 , ... ,Si, the matrices Mi (y) = Ni . More precisely, we will 
prove by induction on i that 

Pr \A)--:Mi(y) = TV,] > 1 - i(w;2-3 2w^2~^) 



For i = 0, the statement is trivial. 

Denote Mp(y) = Ej^[k][Sgy(Ni-i] j)] and Np = Ey[5G(TV,_i; y)j. 

, , Pr ^jMi(y) + Ni : Thf,_i(j/) = TV,_i] < 

< Pr [||Thfp(2/)-TVp||>2-*^] 

+ l^s.(.Np)U ■■ \\Mp(y) - TVpII < 2-^] 

ye{0,iy,qt€{0,l}^ 
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Remembering that Q = Go A and using Lemma 5 we bound the first summand 
by w2~3. The second summand is bounded in [6, Lemma 5.5] by , so 

we get that 



Pr 



\^UMi{y) = Nj] = 



= Pr [A)-_\MAy)=NA- 

Pr [MAy) = Ni : aH M,- (y) = TV,] 



> (1 - (* - 1){w 2~3 + 2w'^2-^))(l - w 2~3 - 2up2~^) 

> l-i{w2~^ + 2uP2-°) 



Now we can conclude that 



Pr - MtJI > 22*1*2 (a + 2w2~'^)] < 



< t2(wi2-3 +2 w;22-^) 



The space required by algorithm MAIN is 0(1) to store the y argument, 
0 (i- 2 D) to store q\, . . . the 0(1) space for computing the function Q, and 
then, for each of the t -2 levels of the recursion, an 0(K) space is needed (for the 
PRS algorithm by Lemma 6 and for computing and [-Jd). This sums up to 
0(/ + ^ 2 ^.') space. □ 



Corollary 3. Every language which ts decidable randomly in space S using R < 
2'® random bits with two sided error, can be decided deterministically in space 
(5+e(Jl))^logJl , 

^max{l,log5— log logi?} 

Proof: Let M be the stochastic matrix associated with the Turing machine that 
decides the language in space S using R random bits, i.e., the (i,j) entry of 
M contains the probability of moving from configuration i to j in the Turing 
machine on a single random bit. Set w — 2^ (M is a.w xw matrix), t = [logR] 
and a = R~^. We now exhaust all the possible random inputs y G {0, 1}* and 
qi, . . . ,qt 2 G {0, 1}^ to the algorithm MAIN and run it on every input to get an 
estimation to M** by taking average of every entry. 



By Theorem 3 we approximates the matrix with error smaller than 



(by the choice of parameters) and in space 0(- 



(S+e(R))^/E^ 



10 



). Taking 



^max{l,log5— log logi?} ^ 

majority vote on the accepting configuration, we decide whether the given input 
is in the language. □ 
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Abstract. 



1 Introduction 

The aim of this paper is to advocate the use of Talagrand’s isoperimetric in- 
equality [10] and an extension of it due to Marton [5, 6] as a tool for the analysis 
of distributed randomized algorithms that work in the locality paradigm. Two 
features of the inequality are crucially used in the analysis: first, very refined 
control on the influence of the underlying variables can be exercised to get sig- 
nificantly stronger bounds by exploiting the non-uniform and asymmetric con- 
ditions required by the inequality (in contrast to previous methods) and second, 
the method, using an extension of the basic inequality to dependent variables 
due to Marton [6] succeeds in spite of lack of full independence amongst the un- 
derlying variables. This last feature especially makes it a particularly valuable 
tool in Computer Science contexts where lack of independence is omnipresent. 
Our contribution is to highlight the special relevance of the method for Com- 
puter Science applications by demonstrating its use in the context of a class of 
distributed computations in the locality paradigm. 

We give a high probability analysis of a distributed algorithm for edge- 
colouring a graph [8]. Apart from its intrinsic interest as a classical combinatorial 
problem, and as a paradigm example for locality in distributed computing, edge 
colouring is also useful from a practical standpoint because of its connection 
to scheduling. In distributed networks or architectures an edge colouring corre- 
sponds to a set of data transfers that can be executed in parallel. So, a partition 
of the edges into a small number of colour classes - i.e. a “good” edge colouring- 
gives an efficient schedule to perform data transfers (for more details, see [8, 2]). 
The analysis of edge colouring algorithms published in the literature is extremely 

* Work partly done while at BRIGS, Department of Computer Science, University of 
Aarhus, Denmark. Partially supported by the ESPRIT Long Term Research program 
of the EU under contract No. 20244 (ALCOM-IT) 
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long and difficult and that in [8] is moreover, based an a certain ad hoc exten- 
sion of the Chernoff-Hoeffding bounds. In contrast, our analysis is a very simple, 
short and streamlined application of Talagrand’s inequalty, only two pages long. 

In § 5, we outline how other edge and vertex colouring algorithms can also 
be tackled by the same methods in a general framework. These examples are 
intended moreover, as a dramatic illustration of the versatility and power of the 
method for the analysis of locality in distributed computing in general. 



2 Distributed Edge Colouring 



Vizing’s Theorem shows that every graph G can be edge coloured sequentially 
in polynomial time with Z\ or Z\ -|- 1 colours, where A is the maximum degree of 
the input graph (see, for instance, [1]). 

It is a challenging open problem whether colourings as good as these can 
be computed fast in a distributed model. In the absence of such a result one 
might aim at the more modest goal of computing reasonably good colourings, 
instead of optimal ones. By a trivial modihcation of a well-known vertex colouring 
algorithm of Luby it is possible to edge colour a graph using 2Z\ — 2 colours in 
O(logn) rounds (where n is the number of processors) [4]. 

We shall present and analyze a simple localised distributed algorithm that 
compute near optimal edge colourings. The algorithm proceeds in a sequence 
of rounds. In each round, a simple randomised heuristic is invoked to colour a 
signihcant fraction of the edges successfully. The remaining edges are passed over 
to succeeding rounds. This continues until the number of edges is small enough 
to employ a brute-force method at the final step. For example, the algorithm of 
Luby mentioned above can be invoked when the degree of the graph becomes 
small i.e. when the condition A ^ logn is no longer satisfied. 

First the algorithm invokes a reduction to bipartite graphs by a standard 
procedure, see [8]. We describe the action carried out by the algorithm in a 
single round starting with a bipartite graph. At the beginning of each round, 
there is a palette of fresh new available colours, [Z\], where A is the maximum 
degree of the graph at the current stage. 

Algorithm : There is a two step protocol: 

— Eacli bottom vertex, in parallel, makes a proposal independently of other 
bottom vertices by assigning a random permutation of the colours to their 
incident edges. 

— Each top vertex, in parallel, then picks a winner out of every set of incident 
edges that have the same colour. Tentative colours of winner edges become 
final. 

— The losers- edges who are not winners- are decoloured and passed to the 
next round. 

^ for permutation. 
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It is apparent that the algorithm is truly distributed. That is to say, each 
vertex need only exchange information with the neighbours to execute the algo- 
rithm. This and its simplicity makes the algorithms amenable for implementa- 
tions in a distributed environment. Algorithm P is exactly the algorithm used 
in [8]. 

We focus all our attention in the analysis of one round of the algorithm. Let 
A denote the maximum degree of the graph at the beginning of the round and 
A' denote the maximum degree of the leftover graph. One can easily show that 
E[zi' I A] < (3A, for some constant 0 < /3 < 1. The goal is to show that this 
holds with high probability. This is done in § 4 after the relevant tools - the 
concentration of measure inequalities - are introduced in the next section. 



3 Concentration of Measure Inequalities 



Talagrand’s isoperimetric inequality for the concentration of measure involves a 
product space 1? = riie[n] equppied with the product measure P = n*e[n] 
(where the PjS are arbitrary measures on the individual spaces) and, crucially, a 
certain notion of “convex distance” between a point x e u> and a subset AC il: 



cIt{x,A) 



sup mm 
2 — 1 






E 

Xi^Vi 



CXi . 



( 1 ) 



(The sup is over all reals ai, . . . ,a„ satisfying the stated condition.) For sets 
A,Bcn, set cIt(A,B) := minx e a dr (x,B). 

Talagrand’s inequality [10] is: 

P{A)P{B) < exp (-4(A, B) /4) , A,BCf2. (2) 

In an alternative elegant approach, Marton [5, 6] shows that this inequality is 
in turn a consequence of certain information-theoretic inequalities. This requires 
an analgoue to (1) for distributions. Note that for a one point set A = {y}, the 
distance (1) is 

dT{x,y) = sup ^ Oj. 

=1 Xi^yi 

Now, for distributions P and Q on 17, we define driP, Q) as the minimum of 
E[dT(7f, T)j over all joint distributions of X and Y with marginals P and Q 
respectively. 

We also require the notion of conditional or relative entropy H(P j Q) of 
distribution P with respect to Q: 

H(P|Q):= ^P(cu)log^. (3) 

(This is also sometimes called the informational divergence and denoted P(P| |<3).) 
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The information theoretic inequality is: if P is a product measure, then for 
any other measure Q, 

dT{P,Q)<V2H{P\Q). (4) 

One can show that the concentration of measure inequality (2) is a consequence 
of (4). 

Marton generalises (4) to the case when the underlying measure P is not 
the product measure (i.e. the corresponding variables are not independent). The 
setting is very similar to that in martingale inequalities where the underlying 
variables are “exposed” one by one. Suppose k <n and x\ , Xi differ only in the 
last co-ordinate: 

x'{-^ = x\~^,Xk 7 ^ Xk. 

We want to consider the corresponding conditional distributions on the re- 
maining variables, once the hrst k variables are fixed to the values x\ and 
Xi respectively. Let us consider a coupling of the two conditional distributions 
P(- I = x^) and P{- \ — x\) and denote this by 7t^(- | x\,Pl). This can 

be thought of as the joint distribution of a pair of random variable sequences. 

Now for fc < f < n, define the “influence” coefficients Vi^k{x\jx\) by: 



Vi,k{Xi, 





(5) 



Put Vk,k(xi,Xi) := 1. Set 



Now dehne 



and 



Vi^k ■= ina.xvi^k{xi,Xi). 



U := max 

i 



K£<i 



V 



:= max 
k 



E ^Ik 

k<i<n 



Then the generalisation of (4) is [6, Theorems. 1]: 



(6) 

(7) 



driP, Q) < VuVy^2H(P I Q). (8) 



As an example of this inequality, consider the space of the m-fold product of 
the symmetric group Sn- The measure on each component space is the uniform 
measure and on the product space we take the product measure. To compute 
the coefficients (5), let us expose the variables component by component. Having 
exposed all variables in the first t < m components, suppose we have further 
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exposed k < n variables in the t + 1st component. We take the natural cou- 
pling that simply swaps the permutation in this component and is the identity 
everywhere else. For this coupling, an easy calculation shows that 

Vi,k{xi,x'{) = l/Vn - fc, 



for k within this component and zero everywhere else. Thus 

U = max ^ 

* i<i 

= max ^ 

i ^ n-i 



KKi 



= Hn, 



where Hn ~ log n is the nth Harmonic number and 



V = max 
k 






i>k 


= max 

k 


y ^ 

^ n — k 




i>k 


= 1. 





Hence on the product space 5™ of the m-fold product of the symmetric 
group, we get the inequality: 

dT{P,Q) < I Q) (9) 

Actually Talagrand is able to prove a stronger result on the symmetric group. 
For the m-fold product of the symmetric group, he proves: 

P(A)P(B) < exp (-4(A, B) /16) (10) 

There is a simple packaged form in which it is often convenient to apply these 
inequalities. In this form, the two components of the analysis are separated from 
each other: 



— The probabilistic component which gives a concentration of measure inequal- 
ity. 

— The smoothness of the function of interest. 

In applications, the concentration of measure inequality is taken ready-made 
without extra effort and one has only to verify the smoothness conditions on the 
function, which is often very easy. 

Theorem 1. Let f be a real-valued function on a product space = r[ie[n] 
with a measure P (which is not necessarily with product measure). Suppose for 
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each X G fi, there are reals ai{x),i G [n\ such that any one of the following 
eonditions holds: 



f(x) < fiy) + ^ ai(x), 


for all y G 12, 


(11) 


fix) > fiy) - ^ ai(x), 


for all y G 12. 


(12) 



— If a measure concentration inequality holds in the form 

P(A)PiB) < exp B) fa) , (13) 

for some a > 0, and uniformly for all x G O, 

Y^ai{x)<c, (14) 

i 

then we have the following concentration result for f around its median value 

^[|/-M[/]|>i]<2exp(^-^). (15) 

— If a measure concentration inequality holds in the form 

driP, Q) < a.j2H{P \ Q) (16) 

for some a > 0 and also the coeffieients ai(x) satisfy 

E[^af(X)]<c, (17) 

i 

then, we have the following concentration result for f around its mean: 

P[\f-m>t]<2exp(^-^^. (18) 

Note two features of the condition (ll)or (12): first the asymmetry: we only 
need one of these one-sided versions to hold. Second its non-uniformity: the 
coefficients a* are allowed to depend on x. Both these features contribute to the 
power of the inequalities in applications. 

4 High Probability Analyses 

4.1 Top vertices 

The analysis is particularly easy when u is a top vertex in Algorithm P. For, in 
this case, the incident edges all receive colours independently of each other. This 
is exactly the situation of the classical balls and bins experiment: the incident 
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edges are the “balls” that are falling at random independently into the colonrs 
that represent the “bins”. Let Te,e e £1, be the random variables taking values 
in [A] that represent the tentative colours of the edges. Then the number of 
edges unsuccessfully coloured around v (and hence the new degree) is a function 
/(Te, e £ N^(v)), where N^(v) denotes the set of edges incident on v. It is easily 
seen that this function has the Lipschitz property with constant 1: changing only 
one argument while leaving the others fixed only changes th evalue of / by at 
most 1. Thus: 

— We can take all coeffcients a* = 1 in (11). 

— Since the variables Tg,e e N'^{v) are independent, we can apply the Tala- 

grand inequality (2)for the product spaces when u is a “top” vertex. 

Hence, applying the first part of Theorem 1, we get the following sharp concen- 
tration result easily: 

Theorem 2. Let v be a top veriex in algorithm P and let f he the number 
of edges around v that are successfully coloured in one round of the algorithm. 
Then, 

Pr[|/-E[/]| >t\< 2 exp . 

For t := ezi (0 < e < 1), this gives an exponentially decreasing probability for 
deviations around the mean. If Z\ ^ logn then the probability that the new 
degree of any vertex deviates far from its expected value is inverse polynomial, 
i.e. the new max degree is sharply concentrated around its mean. 



4.2 Bottom Vertices 

The analysis for the “bottom” vertices in Algorithm P is more complicated in 
several respects. It is useful to see why so that one can appreciate the need for 
a more sophisticated analysis. 

To start with, one could introduce an indicator random variable Xe for each 
edge e incident upon a bottom vertex v. These random variable are not inde- 
pendent however. Consider a four cycle with vertices v,a,w,b, where v and w 
are bottom vertices and a and b are top vertices. Let’s refer to the process of 
selecting the winner (step 2 of the algorithm P) as “the lottery” . Suppose that 
we are given the information that edge va got tentative colour red and lost the 
lottery — i.e. X^a = 0 — and that edge vb got tentative colour green. We’ll argue 
intuitively that given this, it is more likely that Xyi, = 0. Since edge va lost the 
lottery, the probability that edge wa gets tentative colour red increases. In turn, 
this increases the probability that edge wb gets tentative colour green, which 
implies that edge vb is more likely to lose the lottery. So, not only are the Wg’s 
not independent, but the dependency among them is particularly malicious. 

One could hope to bound this effect by using Talagrand’s inequality in it 
simplest form. This is also ruled out however, for two reasons. The first is that 
the tentative colour choices of the edges around a vertex are not independent. 
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This is because the edges incident ona vertex are assigned a permutation of 
the colours. The second reason applies even if we pretend that all edges act 
independently. The new degree of u, a bottom vertex in algorithm P is a function 
/ = f{Te,e G N{v)), where N{v) is the set of edges at distance at most 2 from 
V. Thus / depends on as many as Z\(Z\ — 1) = 0(Z\^) edges. Even if / is Lipshitz 
with constants di — 1, this is not enough to get a strong enough bound because 
d = = 0(Z\^). Applying Theorem 1 as above would give the bound 

Pr[|/ - E[/]| >t]< 2exp • 

This bound however is useless for t = eE[/] since E[/] A/e. 

We will use Marton’s extension of Talagrand’s inequality to handle the depen- 
dence and we shall use the asymmetric and non-uniform nature of the condition 
(11) to control the effects of the individual random choices much more effectively. 

Let A''^(n) denote the set of “direct” edges- i.e. the edges incident on v- 
and let N'^(v) denote the set of “indirect edges” that is, the edges incident on a 
neighbour of v. Let N{v) N^{v) (J N‘^{v). The number of edges unsuccessfully 
coloured at vertex n is a function /(Tg, e e N{v)). 

For a tentative colonring Tg = Cg, choose the coefficients aj(c) as follows. 
Recall that edges compete in a “lottery” at the top vertices: for each colour, one 
“winner” is picked by a top vertex out of all the edges incident on it that receive 
the same tentative colour. Choose a* to be 1 for all unsuccessfully coloured edges 
around v and for all “winners” {w, z) such that the edge {v, w) took part in the 
lottery that {w, z) won (hence (w, z) was responsible for (n, w) being unsuccessful 
and serves as a witness for this fact). All other a* are 0. Thus at most 2A edges 
have non-zero a* values, and hence < 2A. To see that (11) holds, let 

us look at an unsuccessfully coloured edge e in the colouring x. If it is also 
unsuccessful in the colouring y, it is counted in the term f{y). Otherwise, at 
least one of e or its “witness” e' must be coloured differently in y and this will 
be counted in the second term in the right-hand side. 

The underlying space is where N‘^(v) := {u | d{u,v) = 2}lJ{u}. 

Now applying the measure concentration result for the m-fold product of the 
symmetric group Sa from Marton’s inequality, namely (9) and using the second 
part of Theorem 1, we arrive at the following sharp concentration result: 

Theorem 3. Letv be a bottom vertex in algorithm P and let Let f be the number 
of edges successfully coloured around v in one stage of either algorithm. Then, 

Pr[|/-E[/l|>t]<2exp(-2^^). 

For t = eA, this gives a probability that decreases almost exponentially in A. 
As remarked earlier, if Z\ ;§> logn, this implies that the new max degree is 
sharply concentrated around the mean (with failure probability roughly inverse 
polynomial in n). 
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One can improve this by using the stronger inequality (10) for the symmetric 
group. This gives the bound: 

Pr[|/-M[/]|>t]<2exp • 

We first gave the resnlt with Marton’s inequality because although it gives a 
somewhat weaker bound in this example, it actually illustrates a general method 
with wider applicability. 

5 A General Framework 

The method used in the last section is actually applicable in a much more general 
way to the analysis of a wide range of randomised algorithms in a distributed 
setting. In this section, we outline a fairly general framework in which such 
concentration results can be directly inferred. Let / be a function to be computed 
by a randomised local algorithm in a distributed environment respresented by 
a graph G = (V,E). We shall lay down conditions on / and on algorithms 
computing / locally that will enable the methods in the previous section to 
be extended to to derive a sharp concentration result on /. In particular, we 
indicate how the edge colouring algorithm above [8] as well as the edge and 
vertex colouring algorithms from [2, 3] follow directly as well as the analysis of 
a vertex colouring process in [7]. 

Suppose that / is a function determined by each vertex v of the graph as- 
signing labels £{v) to itself and labels £{v,e),e G N{v) to its incident edges by 
some randomised process. In the edge colonring problem, the labels on the edges 
are their colours (we may assume for instance that the lower numbered vertex 
assigns the colonr to an incident edge) and the vertex labels are empty; in the 
vertex colouring problems, the vertex labels are the colours and the edge labels 
are empty. 

The two necessary and sufficient conditions for sharp concentration of / are 
as follows: 

L: Locality of the function: The function / is Lipschitz: changing any one 
label changes the value of / by at most 1. Furthermore, following Spencer [9], 
/ is h-certifiable for some function h : R ^ R i.e. for each x, there is a subset 
I = I (x) (the “certificate”) of the labels of size at most h{f{x)) such that for 
any other y agreeing with x on I, f{y) > f{x). In edge colouring, the function 
/ is the number of edges unsuccessfully coloured around a vertex. It is clearly 
Lipschitz. Also it is h certifiable for h{x) = 2x since each nnsuccessfnlly 
coloured vertex can be attributed to one other adjacent edge of the same 
colour. 

M: Concentration of Measure in the Probability Space: There is con- 
centration of measnre in the nnderlying space in one of the two forms (13) 
or (16) with some positive coefficient a. Marton’s ineqnality gives a method 
for determining this coefficient as a = s/UV for U and V determined by (5) 
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through (7). Two particularly important cases in which this obtains are the 
following. First we assume that the vertex label assigned by a vertex is inde- 
pendent of the edge labels it assigns and that each vertex acts independently 
of the others. 

I: The labels e) assigned by a vertex to its incident edges are independent of 
each other. In this case, the algorithm defines a fully independent probability 
space and Talagrand’s inequality (2) applies. 

S: The labels £(v,e) assigned by a vertex to its incident edges are given by 
a permutation distribution. In this case, the algorithm is symmetric with 
respect to the labels. Marton’s theorem yields measure concentration in the 
form (16) with a = and Talagrand’s inequality for the symmetric 

group gives measure concentration in the form (13) with a = 16. 

The condition I obtains in the algorithms in [3, 7] while the condition S obtains 
in the algorithm of [8] discussed in detail above. 

Theorem 4. Let f be a Lipschitz function which is h-cejiifiable computed by 
an algorithm satisfying the measure concentration property M for some a > 0. 
Then Then 

Pr[|/-«[/l|>,]<2exp(-^). 

In particular if the algorithm satisfies either full independence I or is symmetrie, 
satisfying S, then we have the concentration result: 

Pr[l/-H[/]|>t]<2exp(-jj^), 

where the constant cis4 for the independent case and 16 for the symmetric case. 
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Abstract. We are given a sequence of items that can be packed into 
m unit size bins. In the classical bin packing problem we fix the size 
of the bins and try to pack the items in the minimum number of such 
bins. In contrast, in the bin-stretching problem we fix the number of 
bins and try to pack the items while stretching the size of the bins as 
least as possible. We present two on-line algorithms for the bin-stretching 
problem that guarantee a stretching factor of 5/3 for any number m of 
bins. We then combine the two algorithms and design an algorithm whose 
stretching factor is 1.625 for any m. The analysis for the performance of 
this algorithm is tight. The best lower bound for any algorithm is 4/3 for 
any m > 2. We note that the bin-stretching problem is also equivalent 
to the classical scheduling (load balancing) problem in which the value 
of the makespan (maximum load) is known in advance. 

Keywords. On-line algorithms, approximation algorithms, bin-stretching, 
load balancing, scheduling, bin-packing. 



1 Introduction 

The on-line bin-stretching problem is defined as follows. We are given a sequence 
of items that can be packed into m bins of unit size. We are asked to pack them 
in an on-line fashion minimizing the stretching factor of the bins. In other words, 
our goal is to stretch the sizes of the bins as least as possible to fit the sequence of 
items. Bin-stretching is somewhat related to the bin-packing problem [10, 13, 18]. 
In both cases all the items are to be packed in bins of a certain size. However, in 
bin-packing the goal is to minimize the number of bins while in bin-stretching 
the number of bins is fixed and the goal is to minimize the stretching factor of 
the bins. Hence, results for bin packing do not seems to imply results for the 
bin-stretching problem. 

A bin-stretching algorithm is defined to have a stretching factor /3 if for every 
sequence of items that can be assigned to m bins of a unit size, the the algorithm 
assigns the items to m bins of size of at most /3. 

The motivation for our problem comes from the following file allocation prob- 
lem. Consider a case in which a set of files are stored on a system of m servers, 
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States-Israel Binational Science Foundation (BSF). 
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each of some unit capacity. The files are sent one by one to a remote system 
of m servers in some order. The only information the remote system has on 
the files is that they were originally stored on m servers of unit capacity. Our 
goal is to design an algorithm that can assign the arriving sequence of files on 
the remote system with the minimum capacity required. An algorithm for our 
problem whose stretching factor is (3 can assign the sequence of jobs to servers 
of capacity /3. 

It is also natural to view the bin-stretching problem as scheduling (load 
balancing) problem. In the classical on-line scheduling (load balancing) prob- 
lem there are m identical machines and n jobs arriving one by one. Each job 
has some weight and should be assigned to a machine upon its arrival. The 
makespan (load) of a machine is the sum of the weights of the jobs assigned 
to it. The objective of an assignment algorithm is to minimize the makespan 
(maximum load) over all machines. In the bin-stretching problem we have the 
additional information that the optimal load is some known value and the goal 
is to minimize the maximum load given this information. 

It is clear that an upper bound for the classical scheduling (load balancing) 
problem is also an upper bound for the bin-stretching problem since we may 
ignore the knowledge of the optimal makespan (load). The classical scheduling 
problem was first introduced by Graham [14, 15] who showed that the greedy 
algorithm has a performance ratio of exactly 2 — ^ where m is the number of 
machines. Better algorithms and lower bounds are shown in [7, 8, 9, 11, 12, 19, 
21]. Recently, Albers [1] designed an algorithm whose performance ratio is 1.923 
and improved the lower bound to 1.852. 

The only previous result on bin-stretching is for two machines (bins). Kellerer 
et al. [20] showed that the performance ratio ratio is exactly 4/3 for two machines. 
For m > 2 there were no algorithms for bin-stretching that achieve a better 
performance than those for scheduling. In this paper we provide for the first 
time algorithms for bin-stretching on arbitrary number of machines (bins) that 
achieve better bounds than the scheduling/load-balancing results. Specifically, 
we show the following results: 

— Two algorithms for the bin-stretching problem whose stretching factor is 5/3 
for any number m of machines (bins). 

— An improved algorithm which combines the above two algorithms whose 
stretching factor is 1.625 for any number m of machines (bins). Our analysis 
for the stretching factor of this algorithm is tight (for large m). 

— For a fixed number m > 3 we get an upper bound which is better than 

1.625 for m < 20. 

— Also, we easily extend the lower bound of 4/3 on the stretching factor of any 
deterministic algorithm for m = 2 for any number m>2. 

Observe that the additional information that bin-stretching has over the schedul- 
ing problem really helps in improving the performance of the algorithms. More- 
over, our upper bounds for the bin-stretching problem are lower than the lower 
bounds for the classical load balancing problem for all m > 2 and this fact 
separates the two problems. 
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Note that the notion of stretching factor has been already used for various 
problems and, in particular, for scheduling. A paradigm that is used for attacking 
many of the off-line and on-line problems is to design algorithms that know an 
upper bound on the value of the optimal algorithm. Binary search for the optimal 
value is used in the off-line setting. In fact, this is the way that scheduling is 
reduced to bin-stretching by the polynomial approximation scheme of [17]. This 
paradigm is also used for the related machines model [16] which corresponds to 
bins of different sizes. In the on-line case the paradigm of stretching factor is 
used with a doubling technique. Reducing the case of unknown optimal value to 
known optimal value results in loosing a factor of 4 [2]. The notion of stretching 
factor has also been used in the temporary jobs model where jobs arrive and 
depart at arbitrary times [3, 4, 5, 6]. 



2 Notation 



Let Af be a set of machines (bins) and J a sequence of jobs (items) that have to 
be assigned to the machines (bins). Each job j has an associated weight, Wj > 0. 
As job j arrives it must be permanently assigned to one of the machines. An 
assignment algorithm selects a machine i for each arriving job j. Whenever we 
speak about time j we mean the state of the system after the jth job is assigned. 
Let li{j) denote the load on machine i at time j, i.e., the sum of the weights of 
all the jobs on machine i at time j. The cost of an assignment algorithm A on 
a sequence of n jobs J is defined as the maximum load over all machines, or, 

Ca{J) = ma.Xi^Mk(n). 

The objective of an on-line bin-stretching algorithm is to minimize the stretch- 
ing factor 13; i.e., the cost of a sequence of jobs given that the optimal off-line 
assignment algorithm (that knows the sequence of jobs in advance) assigns them 
at a unit cost. This is unlike the classical on-line scheduling (load balancing) 
problems where the optimal cost is not known in advance and the performance 
is measured by the regular competitive ratio which is defined as the supremum of 
the ratio between the cost of the on-line assignment and the cost of the optimal 
off-line assignment. 

We say that a sequence of jobs can be assigned to m machines by an optimal 
off-line algorithm if it can be assigned with a unit cost. We note some simple 
properties of such sequences of jobs. First, the weight of all jobs must be at most 
1 since a job that is larger than 1 cannot be assigned by any algorithm without 
creating a load larger than 1. Second, the sum of weights of all jobs in a sequence 
of jobs is at most m, the number of machines. That follows from the fact that 
the optimal off-line algorithm can assign jobs with total weight of at most 1 to 
each machine. 
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3 Two algorithms with 5/3 stretching factor 

In this section we present two algorithms with a stretching factor of 5/3 for the 
on-line bin-stretching problem. These are actually two families of algorithms. 
For each family we prove the same 5/3 upper bound. 

We start with a simple algorithm with a stretching factor of 2: put each 
arriving job on an arbitrary machine such that the resulting load on that machine 
will not exceed 2. Obviously, if the algorithm does not fail to find such machine 
it has a stretching factor of 2 by definition. In order to show that such a machine 
is always available we notice that there must be a machine whose load is at most 
1. Otherwise, all the machines have loads larger than 1 which contradicts the 
fact that the optimal solution has maximal load 1. Since the weight of each job 
is at most 1, each arriving job can be assigned to some machine which implies 
that the algorithm never fails. 

Our algorithms use a threshold a to classify machines according to their 
loads. An appropriate choice of a will lead as described later to an algorithm 
whose stretching factor is 1 a. 

Definition 1. A machine is said to be short if its load is at most a. Otherwise, 
it is tall. 

At the arrival time of job j, we define three disjoint sets of machines based 
on the current load and the job’s weight. 

Definition 2. When job j arrives, 1 < j <n, define the following three disjoint 
sets: 

— ^lU) = {ie M \ li{j - 1) + Wj < a} 

— (j) = {ie M \ li(j - 1) < a, Q < k{j - 1) -I- < 1 -h a} 

“ -{i & M \ li{j - 1) > a, k{j - 1) -h < 1 -h a} 

The set Si is of machines that are short and remain short if the current job 

is placed on them. The second set S 2 is of machines that are short but become 
tall if the job is placed on them. The last set S 3 is of machines that are tall 
but remain below 1 -|- a if the job is placed on them. Note that there may be 
machines which are not in any of the sets. We omit the indices j and a when 
they are clear from the context. 

Using this definition we can now describe the two algorithms: 

ALGlai When job j arrives: 

— Put the job on any machine from the set S 3 or S± but not on an empty 
machine from if there is a non-empty machine from S±. 

— If Si — S 3 — (f then put the job on the least loaded machine from the set 
S 2 . 

— If Si — S 2 — S 3 — 4> then report failure. 

ALG2aZ When job j arrives: 

— Put the job on any machine from the set Si. 
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— If 5i = ^ then put the job on any machine from the set S 3 . 

— li Si — S 3 — then put the job on the least loaded machine from the set 

52. 

— If Si — S 2 — S 3 — 4> then report failure. 

Notice that these two algorithms are actually families of algorithms. In the 
first algorithm we are free to choose how to select a machine from S 3 and whether 
we put a job on a machine from Si or from S 3 . In the second algorithm we are 
free to choose how to select a machine from 5i and from S 3 . 

Note that since the algorithms assign job j only to machines from the sets 
5i(i), S 2 {j) and S 3 {j), their stretching factor is at most 1 -|- a as long as they 
do not fail. For 1 < i < 3 let Jj be the set of jobs j assigned to a machine in 
Si[j) at their arrival time by the algorithm. 




Fig. 2. Ji, J 2 and J 3 



Theorem 1. ALGla above never fails for a > 2jZ. Therefore, for a = 2/3 it 
has a stretching factor o/5/3. 
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Theorem 2. ALG2a above never fails for a > 2/3. Therefore, for a — 2/3 it 
has a stretching factor of 5/3. 

In order to prove the above theorems we assume by contradiction that ALGla 
or ALG2a fail on the last job of some sequence of n + 1 jobs and that this 
sequence can be assigned by an optimal algorithm. We start with the following 
simple lemmas: 

Lemma 1. At time n all the machines are tall and there are at least two ma- 
chines whose load is less than 1.. 

Proof. At time n, when the last job arrives, the three sets, S±, S 2 and S 3 are 
empty. Hence, Zj(n)+u)„+i > 1 + a for all 1 < i < m. Since the weight of each job 
is at most 1, li(n) > 1 + a — Wn+i > a for all 1 < i < m. Thus, all the machines 
are tall. Assume by contradiction that except a machine i, all the machines have 
loads of 1 or more. When the last job comes, li{n) + Wn+i > 1 + a > 1 and 
since all other machines also have loads of 1 or more it implies that the sum of 
all loads is above m which contradicts the fact that the sequence of jobs can be 
assigned by an optimal algorithm. 

Corollary 1. The last job is larger than a. 

Proof. At time n, when the last job arrives, there is a machine i whose load is less 
than 1 by lemma 1. Since the algorithm fails to assign the last job, 1 + Wn +1 > 
1 + a or lOn+i > Q- 

To utilize some of our lemmas for the improved algorithm we use a more 
general formulation. Consider a subset M' C M of machines. We define the 
notion of composed algorithm D[ALG, M') where ALG is ALGla or ALG2a 
on a sequence of jobs / and a set of machines M as follows: The algorithm decides 
on an arbitrary set P G I and assigns it to a machine in M’ and it assigns the 
rest of the jobs to a machine in M — M'. The assignment of jobs I' is done by 
running algorithm ALG on the set of machines M'. However, the jobs in / — /' 
are assigned to a machine in M — M' in any arbitrary way. Moreover, we make 
no assumption on the sequence I, for example, the optimal algorithm may not 
be able to assign them in M without exceeding a load of 1 (in particular, jobs 
of weight larger than 1 may exist). 

Note that D[ALG, M') is the same as ALG for M' — M. We already proved 
that if ALGla or ALG2a fail on the n +1 job of sequence J of jobs then at time 
n all the machines are tall and there are two machines whose load is less than 
1. Meanwhile, for the composed algorithms we assume that after a sequence of 
n jobs I was assigned by D[ALG, M') all the machines from the set M' are tall 
and two of them have loads below 1. This assumption is used until (including) 
lemma 6. Also, we assume that 0 < a < 1 unless otherwise specified. 

Define the raising job ki of machine i 6 M' as the job that raises machine i 
from being short to being tall. More formally, li{ki) > a and li(ki — 1) < a. The 
raising jobs are well defined since we assumed that all machines from M' are tall. 
Rename the indices of the machines in M' to 1, . . . , m' such that k± < k 2 < ... < 
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km' i-e., the order of the machines in M' is according to the time the machines 
crossed a. From now on, all the indices are according the the new order. Note 
that the set of the raising job is J 2 - Denote by si, S 2 the two machines in M' 
(si < S 2 ) whose load is less than 1 at time n. 

Lemma 2. If at time n, the load of some machine i G M' is at most I then 
Wk^, > 1 + a. — I for i' > i,i' E M' . 

Proof. Both ALGla and ALG2a assign jobs to machines from S 2 only if the two 
other sets are empty. By definition of ki', at time Ajj/ — 1, job fcj/ arrived and was 
assigned to machine i' . By the definitions of S 2 and kit, machine i' was in the 
set S 2 {ki') and therefore the sets Si{kii) and S 3 {kii) were empty. Machine i was 
already tall at that time since i < i'. This implies that at time ki> — 1 machine 
i was not in S 2 {ki>). Hence li{ki> — \) + Wi > l-|-aorri)i > l-|-a — ~ 1) > 

l-\- a — li (n) > 1 -|- a — Z. 

Since we assumed the load of machine si is at most 1 at time n, the lemma 
above implies: 

Corollary 2. Jobs ki for si < i <m' are larger than a. 

Let fi — li{ki — 1) for 1 < i < m'. This is the load of each machine just 
before it was raised by the raising job. 




Fig. 3. The series /,. Only machines from M' are shown for clarity. 



Lemma 3. For i' > i, both in M' , fi <k,{ki-l)< fi,. 

Proof. At time ki — 1 the load of machine i is fi by definition. At this time, 
by definition of ki, machine i is in the set S 2 which means that S± and S 3 are 
empty. Thus, at the same time, each machine i' > i must be in S 2 or not in any 
of the sets. Note that if the load of machine i' is below fi at time ki — \ then 
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it is in S 2 {ki) since machine i, whose load is higher, is in S 2 {ki). Therefore, the 
load of machine i' is at least /j since both algorithms choose the least loaded 
machine from 82 - Machine i' is still short so its load is at most fii. 

Corollary 3. The senes fi , 1 <i < m' , is non- decreasing. 

Lemma 4. For i < S 2 , fi < 1 — a. 

Proof. According to corollary 2, > oc. Since the load of machine S 2 is below 

1 at time n, 



1 ^ ^ ^* 2 (^* 2 ) — ^*2 (^*2 1 ) — /®2 d " 



Therefore, 



fs^ <1- Wk,^ < 1 - 0 . 



By corollary 3, /t < 1 — o for i < S 2 - 



Note that up to now our proof was not specific to one of the algorithms. Now 
we focus our attention on the first algorithm. Recall that we still assume that 
the set of jobs / is assigned by algorithm D^ALGla, M') or D[ALG2a, M') to 
the set of machines M. 



Lemma 5. At any time of the activity of D{ALGla, M'), there is at most one 
non empty machine in M' whose load is at most ^ . 

Proof. Assume by contradiction that at a certain time there are two such ma- 
chines. Let j be the first job that its assignment created two such machines. Thus, 
job j arrived and was placed on an empty machine *2 while another non empty 
machine i± had a load of at most Clearly Wj < f and li^(j — 1) + Wj < a. 
Therefore i\ 6 8 i[j) and job j should have been assigned to ii. 

Lemma 6. Assume a. > 2/3 and D[ALGla, M') assigns a set of n jobs I to a 
set of machines M . Then the weight of each job ki, 1 < i < m' , is more than a. 

Proof. We have already seen in corollary 2 that jobs ki for S\ < i < m' are larger 
than a. Now we show that jobs ki for i < Si are also larger than a. 

By lemma 4, and /jj are both below 1 — a. According to lemma 3, 
f‘i ^ ^* 2(^*1 ~ 1) ^ f‘ 2 - Recall that the load of machine si at time kg^ — 1 is 
/sj. At that time, the loads of machines si and S 2 are below 1 — a < ^. Thus, 
by lemma 5, the less loaded machine, si, is empty, or — 1) = = 0. By 

corollary 3, /» = 0 for all machines i < Si. A small fi implies that machine i has 
a large raising job. More formally, for i < si: 

^ki — ^i(.ki 1) — 0 ^ QI. 

Now we are ready to complete the proof of theorem 1. Assume that ALGla 
fails on the nA 1 job of a sequence J of jobs. After the n jobs have been assigned, 
all the machines are tall and there are two machines whose load is less than 1 
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by lemma 1. We take M' — M and therefore I' — I where I is the set of jobs 
J without the last job. The previously defined series ki is now defined over all 
machines since we took M' = M . By lemma 6, for a > 2/3, this implies that 
there are m jobs larger than a. Corollary 1 shows that the last job is also larger 
than a. We showed there are m-|- 1 jobs larger than a. This contradicts the fact 
that the number of jobs of weight larger than 1/2 is at most m since the optimal 
algorithm can assign at most one such job to each machine. This completes the 
proof of theorem 1. 

The proof of theorem 2, i.e. ALG2a has the same stretching factor, is omitted. 



4 Improved Algorithm 

In this section we present an improved algorithm whose stretching factor is 1.625. 
The improved algorithm combines both of the previous algorithms into a single 
algorithm. 

At the arrival time of job j we define five disjoint sets of machines based on 
the current load and the job’s weight. 

Definition 3. When job j arrives, 1 < j <n, define the following five sets: 

— ■S'i'i(i) = {ie M \ k{j - 1) + Wj < a, k{j - 1) + Wj < 2a - 1} 

“ e M I li{j-l)+Wj < a, < 2a-l, li{j-l)+Wj > 2a-l} 

“ ^laU) = {ie M \ k{j - 1 ) + Wj < a, k{j - 1 ) > 2 a - 1 } 

— 5a (j) = {ie M \ k{j - 1) < a, a < k{j - 1) + < 1 -|- a} 

— ■S'f (i) = {ie M \ li{j - 1) > a, k{j - 1) < 1 a} 

Note that the previously defined is split into three sets according to a low 
threshold of 2a— 1. We still use the notation Si for the union of these three sets. 
We omit the indices j and a when they are clear from the context. The sets Ji, 
J 2 and J 3 are defined as in the previous section. 

Improved Algorithm: When job j arrives: 

— Put the job on a machine from the set according to: 

• Put the job on any machine from the set 5 i 3 or 5n but not on an empty 
machine from the set 5n if there is a non-empty machine from the set 
5ii. 

• If 5ii = 5 i 3 = <t> then put the job on the least loaded machine from the 
set 5 i 2 - 

— If Si — 4> then put the job on the earliest machine from the set S 3 , that is, 
the machine that was the first to cross the threshold a from all machines in 

5 ' 3 - 

— If 5i = ^3 = ^ then put the job on the least loaded machine from the set 
52. 

— li Si — S 2 — S 3 — (j) then report failure. 
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This improved algorithm is contained in the family of ALG2a presented in 
the last section. Our algorithm, however, defines the methods used in placing 
jobs on machines from the sets Si and S 3 . The way we choose a machine from 
Si is by the method presented in ALGla- In choosing a machine from S 3 we 
prefer the earliest machine according to the order of crossing the threshold. The 
proof of the theorem below is omitted. 

Theorem 3. The improved algorithm above never fails for 5/8 < a < 2/3. 
Thus, for a = 5/8 it has a stretching factor 0 / 13/8. 

5 Lower Bounds 

In this section we prove a general lower bound of 4/3 on the stretching factor of 
deterministic algorithms for any number of machines. We show a lower bound 
of 5/3 — e for arbitrary small e for the family of ALGla and a lower bound of 
13/8 — e for arbitrary small e on the stretching factor of our improved algorithm. 
Note that it is impossible to show a lower bound of 5/3 — e for ALG2a since the 
improved algorithm is in that family. In these two cases we assume the number 
of machines is large enough. The details of all the lower bounds are omitted. 

References 

[1] S. Albers. Better bounds for on-line scheduling. In Proc. 29th ACM Symp. 
on Theory of Computing, pages 130-139, 1997. 

[2] J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts. On-line load balanc- 
ing with applications to machine scheduling and virtual circuit routing. In 
Proc. 25th ACM Symposium on the Theory of Computing, pages 623-631, 
1993. Also in Journal of the ACM 44:3 (1997) pp. 486-504. 

[3] B. Awerbuch, Y. Azar, S. Plotkin, and O. Waarts. Competitive routing of 
virtual circuits with unknown duration. In Proc. 5th ACM-SIAM Sympo- 
sium on Discrete Algorithms, pages 321-327, 1994. 

[4] Y. Azar, A. Broder, and A. Karlin. On-line load balancing. In Proc. 
33rd IEEE Symposium on Foundations of Computer Science, pages 218- 
225, 1992. Also in Theoretical Compute Science 130 (1994) pp. 73-84. 

[5] Y. Azar and L. Epstein. On-line load balancing of temporary tasks on iden- 
tical machines. In 5th Israeli Symp. on Theory of Computing and Systems, 
pages 119-125, 1997. 

[6] Y. Azar, B. Kalyanasundaram, S. Plotkin, K. Pruhs, and O. Waarts. On- 
line load balancing of temporary tasks. In Proc. Workshop on Algorithms 
and Data Structures, pages 119-130, August 1993. 

[7] Y. Bartal, A. Fiat, H. Karloff, and R. Vohra. New algorithms for an an- 
cient scheduling problem. In Proc. 24th ACM Symposium on Theory of 
Algorithms, pages 51-58, 1992. To appear in Journal of Computer and 
System Sciences. 




On-Line Bin-Stretching 



81 



[8] B. Chen, A. van Vliet, and G. Woeginger. A lower bound for randomized 
on-line scheduling algorithms. Information Processing Letters, 51:219-222, 
1994. 

[9] B. Chen, A. van Vliet, and G. J. Woeginger. New lower and upper bounds 
for on-line scheduling. Operations Research Letters, 16:221-230, 1994. 

[10] E. G. Coffman, M. R. Garey, and D. S. Johnson. Approximation algo- 
rithms for bin packing: a survey. In D. Hochbaum, editor, Approximation 
algorithms. 1996. 

[11] U. Faigle, W. Kern, and G. Turan. On the performance of online algorithms 
for partition problems. Acta Cybernetica, 9:107-119, 1989. 

[12] G. Galambos and G. J. Woeginger. An on-line scheduling heuristic with 
better worst case ratio than graham’s list scheduling. SIAM J. Computing, 
22:349-355, 1993. 

[13] M.R. Garey and D.S. Johnson. Computers and Intractability. W.H. Freeman 
and Company, San Francisco, 1979. 

[14] R.L. Graham. Bounds for certain multiprocessor anomalies. Bell System 
Technical Journal, 45:1563-1581, 1966. 

[15] R.L. Graham. Bounds on multiprocessing timing anomalies. SIAM J. Appl. 
Math, 17:263-269, 1969. 

[16] D. Hochbaum and D. Shmoys. A polynomial approximation scheme for 
scheduling on uniform processors: Using the dual approximation approach. 
SIAM Journal on Computing, 17(3):539-551, 1988. 

[17] D. S. Hochbaum and D. B. Shmoys. Using dual approximation algorithms 
for scheduling problems: Theoretical and practical results. J. of the ACM, 
34(1):144-162, January 1987. 

[18] D. S. Johnson. Near-optimal bin packing algorithms. PhD thesis, MIT, 
Cambridge, MA, 1973. 

[19] D. R. Karger, S. J. Phillips, and E. Torng. A better algorithm for an ancient 
scheduling problem. In Proc. of the 5th ACM-SIAM Symposium on Discrete 
Algorithms, pages 132-140, 1994. 

[20] H. Kellerer, V. Kotov, M. G. Speranza, and Zs. Tuza. Semi on-line algo- 
rithms for the partition problem. Operations Research Letters. To appear. 

[21] J. Sgall. On-line scheduling on parallel machines. Technical Report Tech- 
nical Report CMU-CS-94-144, Carnegie- Mellon University, Pittsburgh, PA, 
USA, 1994. 




Combinatorial Linear Programming: 
Geometry Can Help * 



Bernd Gartner 

Institut fiir Theoretische Informatik, ETH Zurich, ETH-Zentrum, CH-8092 Ziirich, 
Switzerland (gaertnerSinf . ethz . ch) 



Abstract. We consider a class A of generalized linear programs on the 
d-cube (due to Matousek) and prove that Kalai’s subexponential simplex 
algorithm Random-Facet is polynomial on all actual linear programs 
in the class. In contrast, the subexponential analysis is known to be 
best possible for general instances in A. Thus, we identify a “geometric” 
property of linear programming that goes beyond all abstract notions 
previously employed in generalized linear programming frameworks, and 
that can be exploited by the simplex method in a nontrivial setting. 



1 Introduction 

While Linear Programming (LP) is known to belong to the complexity class 
P [17], its combinatorial complexity in the unit cost (RAM) model is as yet 
unresolved. This means, it is not known whether there is a polynomial p{n,d), 
such that every linear program with n constraints in d variables can be solved 
in time p{n,d), if all arithmetic operations are assumed to incur unit cost. In 
other words, no strongly polynomial algorithm for LP is known. One motivation 
for getting down to this problem is the simplex method, the oldest and still most 
widely used algorithm to solve LP [6]. The simplex method naturally lends itself 
to unit cost analysis, where the basic complexity measure is the number of pivot 
steps which in turn depends on the pivot rule chosen. The hope is that eventually 
a pivot rule is discovered where this number can be bounded by a polynomial in 
n and d. 

Previous results in this direction are rather discouraging. For the pivot rule 
originally proposed by Dantzig [6], Klee and Minty have constructed a class of 
LP (the so-called “Klee-Minty cubes”) where this rule leads to an exponential 
number of steps [18]. In the sequel, such worst-case examples have been found 
by various researchers for almost all known deterministic pivot rules, see [12] for 
an overview and [2] for a new unified view of these examples. 

A few years ago, progress has been made using randomized pivot rules. The 
subexponential bounds independently established by Kalai [15] as well as Ma- 
tousek, Sharir and Welzl [19] are far from being polynomial, bnt they still rep- 
resent a breakthrough towards the goal of understanding the complexity of LP 
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in the RAM model. The currently best algorithm based on these results invokes 
algorithms by Clarkson and achieves (for n>d) expected runtime 

0{cPn + exp(0(-\/dlog d))), 

see [11] for a survey. If n = 0(d), one can even prove a bound of exp(0(\/d)) 
[15]. 

A remarkable fact about the subexponential pivot rules is that they are 
combinatorial in the sense that actnal coordinates of the LP are only used to 
decide whether progress is possible in a certain situation, but not to measure, let 
alone maximize that progress in any way. This is in sharp contrast to the pivot 
rules employed in practice, where strategies that adapt to the given instance as 
well as possible are typically most successful. 

In the theoretical setting, this “ignorance” is rather a strong point, in par- 
ticular when combined with randomization (which is just an elegant way to 
deal with ignorance). The worst-case constructions above all work by “penal- 
izing” behavior that is either based on the coordinates of the problem (like in 
Dantzig’s rule) or on deterministic choices foreseeable in advance (like in case 
of Bland’s rule [4]). Randomized combinatorial rules are harder to fool by such 
examples, and the subexponential bounds established in 1992 are still the best 
known worst-case bounds for LP. 

Another advantage gained by ignorance is generality; in fact, the algorithm 
by Matousek et. ah requires only very basic properties of LP in order to work, 
and these properties are shared by many other problems, including nonlinear and 
even nonconvex optimization problems, for which in some cases the best known 
bounds are obtained using the uniform algorithm for the general problem class 
[19,21]. The abstract class of problems amenable to this approach has been 
termed LP-type problems. 

Similarly, Kalai’s subexponential algorithm works in the more general setting 
of so-called abstract objective functions (AOF), which he uses as a tool to derive 
extremely simple and beantiful proofs for known and new facts in LP and poly- 
tope theory [16]. While his algorithm is a primal simplex algorithm under the 
pivot rule Random-Facet (which we discuss below), the algorithm of Matousek 
et. al. is a dual version of it. 

As it tnrns out, the concepts of LP-type problems and abstract objective 
functions are basically equivalent, and similar to still other attempts to gen- 
eralize linear programming in connection with the simplex method [1,8]. In a 
sense, these frameworks represent all properties of LP that have been found 
to be useful in combinatorial algorithms so far. No property of LP not present 
in these frameworks is known that can provably speed up the existing subex- 
ponential algorithms, or help in devising new ones with better runtimes. It is 
even possible that the subexponential bounds that have been established are a 
gross overestimate, and that the behavior is in fact polynomial on actual linear 
programs. 

At least in the abstract setting, however, the subexponential analysis is tight. 
This has been shown by Matousek [20] who constructed a class of LP-type prob- 
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lems (containing LP and non-LP instances) with n — 2d constraints, on most of 
which the algorithm in [19] is indeed subexponentially slow. He failed, however, 
in proving an actual LP in the class to have this property, and the question 
remains open whether linear programs have some distinguishing feature which 
allows them to be handled faster by the algorithms in [15] and [19] (or any other 
combinatorial algorithm). 

In this paper we show that such a distinguishing feature exists at least among 
the problems in the abstract ‘worst-case’ class constructed by Matousek. In 
particular, after reformulating these problems as abstract objective functions 
(where they appear as generalized linear programs on a d-cube), we prove that 
Kalai’s Random-Facet algorithm handles the LP instances among them in 
0(cF) time. It follows that the ‘slow’ examples in the class must be non-LP 
instances. The result is obtained by characterizing a certain necessary condition 
for being an LP instance in terms of a simple combinatorial property. 

Although Matousek’s class is only a small subclass of all AOF on the d-cube, 
this result is interesting in two respects. On the one hand it shows that even a 
completely ignorant pivot rule like Random-Facet can implicitly “recognize” 
LP instances, which raises hopes that it might make such a distinction also on the 
general class of all AOF, ultimately leading to a strongly polynomial algorithm 
for LP. 

On the other hand, we derive the first combinatorial property of LP that 
goes beyond the ones present in the abstract frameworks considered so far, and 
that can algorithmically be exploited in a nontrivial setting. 

Note that as an isolated fact, the polynomiality of Random-Facet on the 
LP instances in Matousek’s class is not remarkable; there are other (even very 
trivial) algorithms that achieve this for all problems in the class, as will become 
clear below. The interesting statement is that Random-Facet is fast on the LP 
instances, although it is slow on other instances. This means, the LP instances 
are provably easier is this context; a similar statement in the general situation 
would be a major step forward. 

Figure 1 summarizes the situation. The main challenge remains to replace 
the question mark in that figure by a meaningful bound. One way of achieving 
this could be to extract more useful combinatorics from the rich geometry of LP. 
In a specihc situation, our result presents a first approach. 

The paper is organized as follows. In Section 2 we introduce the concept of 
abstract objective functions on a polytope. 

Section 3 turns to the special case where the polytope is a d-dimensional 
cube, and describes the simplex algorithm with Kalai’s pivot rule Random- 
Facet, specialized to that situation. We also derive the subexponential upper 
bound for the case of a cube, where it is simpler to obtain than for general 
poly topes. 

Section 4 introduces Matousek’s class of AOF on the d-cube, along with a 
review of his lower bound proof. Finally, Section 5 contains our new polyno- 
mial analysis of Random-Facet for the LP instances in Matousek’s class. A 
conclusion appears in Section 6. 
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Fig. 1. Complexity of Random-Facet, known bounds for n = 2d 



2 Abstract Objective Functions 

LP can be formulated as the problem of minimizing a linear objective function 
in d variables over a polyhedron, given as the intersection of n halfspaces in 
Without loss of generality, one can assume that the polyhedron is simple and 
bounded, so that the feasible region of the LP is a simple polytope P, meaning 
that every vertex has exactly d incident edges. In this setting, the simplex method 
traverses the graph of vertices and edges of P - along a path of decreasing 
objective function value - until the traversal ends in a vertex of minimum value. 
A pivot rule decides at each intermediate vertex which decreasing edge will be 
followed, in case there is a choice. (See [5] for a thorough introduction to the 
simplex method.) 

The fact that the traversal actually ends in a vertex of minimum value and 
not just in some local minimum is an obvious but important feature of LP. 
Moreover, this property holds for every face P of P because P is a polytope 
itself. This means, the simplex method can be used to find face minima, and 
this is a crucial substep of the Random-Facet rule we introduce below. 

If the pivot rule is combinatorial, it will classify edges incident to the current 
vertex only by the property of being increasing or decreasing with respect to the 
objective function. Geometric notions of “steepness” or “amount of progress” as 
frequently considered by pivot rules in practice are not taken into account. 

In that situation, however, the geometric information that is actually used 
boils down to an acyclic orientation of the graph of P with the property that 
the subgraph induced by a face F has a unique sink, for every nonempty face 
P of P. If an orientation with these properties is obtained by assigning distinct 
‘abstract’ objective function values (p{v) to the vertices v (with the meaning that 
edge {u,u;} is directed v ^ w if and only if ^(n) > (f>{w)), we call 4> an abstract 
objective function on P. 

It is clear that every linear functional in general position w.r.t. P is an AOF, 
but there are more, as we will see soon. All simplex algorithms with combinatorial 
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pivot rules can be run on an AOF and will discover the vertex v of minimum 
value (p{v) in P. 

It is worth noticing that AOF do not only arise in connection with the simplex 
algorithm; in fact, they have a surprizing interpretation as shelling orders of P^, 
the polytope dual to P [13]. 



3 The Random-Facet simplex algorithm 

An important special case (which we deal with exclusively in the sequel) arises 
when P is combinatorially equivalent to the d-dimensional cube In this case, 
the vertices can be identified with the set of 0/1-vectors V := {0, l}*^, where it 
is convenient to view V as the vector space GF{2)'^ over the two-element field 
GF{2). Two vertices v, v' are adjacent in the cube graph if and only if they differ 
in exactly one component, i.e. if n — n' = e* for some unit vector e* G V. 

The nonempty faces of C"^ can be identified with pairs {v,S), v G V, S C 
[d] d}, where 



(v,S)- W eV\vi=vlyi^ S}. 

Thus, the face {v, S) consists of all vertices that agree with v outside of 5. The 
dimension of {v, S) is |5|, and we have {v, [d]) ~ C"^ and {v, 0) ~ {n}. 

Now we are prepared to describe the algorithm Random-Facet for an AOF 
(/) on the d-cube. Its basic idea is as follows. Given some vertex v, choose a facet 
F containing v at random and recursively find the vertex v' of smallest value 
4>{v') in F. If (p{v') < (p{v"), where v" is the unique neighbor of v' not in F, then 
stop and return v'; otherwise, repeat from v" . In pseudocode, the algorithm can 
be described as follows {v is the current vertex and {v, S) denotes the face to be 
handled — initially, it is the whole cube). 

Algorithm 1. 

RANDOM-FACEx(n, S): 

IF S' = 0 THEN 
RETURN V 
ELSE 

choose i G S at random 
v' :=Random-Facet(u, S \ {f}) 

IF (p(v') < 4>{v' + 0j) THEN 
RETURN v' 

ELSE 

v" := v' + 6j 

RETURN RaNDOM-FACET(u",S) 

END 



END 
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Because in the first recursive call the set 5 gets smaller, and in the second 
recursive call we have 4>{v") < 0(n), the algorithm eventually terminates and 
returns the vector 

opt(n, 5) 

satisfying <?i(opt(n, S)) = min(n, S), where we set min(v, S) := min^/g(„^ 5 ) <PW)- 
Here we need the AOF property that the local face minimum returned by the 
algorithm is actually a global face minimum. 

The complexity of the algorithm can be estimated by counting the number of 
pivot steps, which is the number of times the operation v" := v' + e* is executed 
throughout the algorithm. It is easy to see that the overall number of operations 
is larger by a factor of at most 0(d). Here is a sketch of the subexponential 
bound on the expected number of pivot steps. 

Consider a pair {v, S) and i G S. We say that i is fixed w.r.t. {v, S) if (p{v) < 
min(n + e*, S' \ {f}). This means, v is “better” than the best vertex in (v,S) 
that differs from v in the i-th position. Consequently, if we start the algorithm 
Random-Facet on (v,S), the f-th position will never get flipped, because all 
vertices encountered throughout are at least as good as v itself. 

The hidden dimension of {v, S) is then defined as 

h{v,S) := |S| — |{f G S I f is fixed w.r.t. (v,S)}|. 

The motivation for this definition is that although the face {v, S) has dimen- 
sion |S|, the actual degree of freedom w.r.t. the algorithm Random-Facet is 
only h{v, S). 

The following is the crucial fact: if h{v,S) = k and the non-fixed indices in 
S are ordered such that 

min(v, S \ {«i}) > • • • > min(v, S \ {4}), 

then the hidden dimension of {v", S) is at most k — i ii i = ii in the algorithm. 

Now let T{k) denote the maximum expected number of pivot steps that occur 
in a call to RANDOM-FACET(n, S'), where (v,S) has hidden dimension k. From 
the preceding discussion one easily proves that T(0) = 0 and 

1 ^ 

Tik)<Tik-l) + -Y,il + T{k-i)), 

^ e=i 

for fc > 0. From this, an upper bound of T(d) < exp(2\/d) — 1 for the expected 
number of pivot steps follows, see [7, Lemma 5.21] for a derivation of this bound. 

4 Matousek’s Lower Bound Construction 

Matousek considers special AOF on the d-cube, defined by regular, lower-triangu- 
lar matrices A e GF(2)‘^^‘^. Such matrices have one-entries along the main 
diagonal and arbitrary entries below. For given A, an AOF 0 is defined by 

4>{v) = Av for all v G V, 



( 1 ) 
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where the values are compared by lexicographical order over GF{2)'^, i.e. w < w' 
if Wj = 0 at the smallest index where Wi ^ w[. If we consider the values w as 
c/-bit binary numbers W1W2 ■ ■ - Wd, get an intuitive interpretation of this order 
as the usual order among natural numbers. 

It follows that u = 0 is the optimal vertex of the whole cube C"^ for every 
matrix A. To see that 0 is indeed an AOF, we need the following 

Lemma 1. Let (v, S) be a face. We have v = opt(r;, S) if and only if (Av)i — 0 
for all i E S. 

Proof. Assume v — opt(u, S). Then v is in particular a local minimum in {v,S), 
i.e. Av < A(v + 6j) = Av + Ai for all i E. S, where A4 is the Ath column of A. 
Because A is lower-triangular with an — 1, the first index where Av and Av-\-Ai 
differ is i, and this implies {Av)i = 0. 

Now assume {Av)i — 0 holds for all i E S, and let v' be some other vertex in 
(v,S). Let j E S he the smallest index where v and v' differ. Again, the shape 
of A implies that j is then also the smallest index where Av and Av' differ. By 
assumption we must have {Av')j = 1, so (f{v') > <p{v). Because this holds for all 
u', V = opt(n, S) follows. □ 

The proof shows in particular that (u, 5) contains exactly one local minimum, 
namely opt(r;, 5), and this proves the AOF property. The lemma also suggests the 
following alternative formulation of Random-Facet, based on handling values 
w = Av instead of vertices v. 

Algorithm 2. 

RF-Flip(w,5): 

IF S' = 0 THEN 
RETURN w 
ELSE 

choose i E S at random 
w' :=RF-Flip(w,S\ {f}) 

IF w' = 0 THEN 
RETURN w' 

ELSE 

w" := w' + Ai 
RETURN RF-Flip(w)",S) 

END 

END 



It is obvious that RANDOM-FACET(r;, S) is equivalent to RF-Flip(Au, S). 
The latter, however, is more convenient in the following sketch of Matousek’s 
lower bound proof. An intuitive feature of RF-Flip is that every “pivot step” 
w" :=w' + Ai decreases the value currently maintained by the algorithm. More- 
over, the actions of RF-Flip(w, S) only depend on the |S| x |S|-submatrix A(S) 
of A consisting of the rows and columns with indices in S, and on the vector Ws, 
the restriction of w to S. 
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Let us first define a concept of rank for pairs (w,S), similar in spirit to the 
hidden dimension for pairs (v,S) in the previous section. If 5 = {*i, . . . 
ii < ■ ■ ■ < im, and t is the smallest index such that Wi^ = 1, then we define 

r(w, S) := |5| - t + 1. 

If no such t exists, we set r{w, S) — 0. In other words, r(w, S) is the number of 
significant bits in the binary number interpretation wn-^wn^ . . . Wi^ of ws- 

Let Ts(k) denote the expected number of pivot steps in a call to RF- 
Flip(w,S'), where A is a random lower-triangular matrix with an = 1 for all 
i, and rc is a random vector with r{w,S) < k. This means, the expectation is 
over A, w and the choices of i in the algorithm. Because A and w are random, 
Ts{k) only depends on the size of S. W.l.o.g. we can assume l^l = k, because 
insignificant bits in Ws remain insignificant throughout the algorithm and thus 
never lead to pivot steps. This means, we can write T{k) instead of Ts{k) 

The crucial observation is that if the second recursive call is executed at all 
(which happens with probability 1/2, depending on the bit Wi of the start value 
w), then r{w",S) < k — £ ii i = ii. This follows quite directly from Lemma 1. 
Moreover, w” is of the form w' + Ai and therefore random again, because Ai 
was random. It remains to observe that the actions of RF-FLip(tc", 5) do not 
depend on Ai anymore, so that w" is independent of the (random) entries of the 
matrix that are still relevant. From this, one can prove that T(0) = 0 and 

/ ft 

T(k)^T(k-l) + -lj2(l+T(k-£) 




for A; > 0, see [20] for details. A subexponential lower bound of 



T(d) = £2 




(2) 



for the expected number of pivot steps in the whole cube C"^ follows, see [7, 
Result 6.10]. 

We note that this proof can be derandomized, i.e. we can construct a fixed 
matrix A and a fixed start value w such that RF-Flip(u), [d]) is subexponentially 
slow (with worse constants than in (2), though). This construction will appear 
in the full paper. 



5 A Polynomial Bound for the LP Instances 

In this section we show that the subexponential lower bound developed in the 
previous section does not apply to the LP instances in Matousek’s class which we 
also call realizable instances below. We prove that Random-Facet solves any 
realizable instance with an expected polynomial number of O(rf^) pivot steps. 

The proof consists of two stages. In the first stage we observe that if the 
AOF generated by a matrix A is realizable, then A does not contain certain 
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forbidden submatrices which come from 3-dimensional realizability restrictions. 
Subsequently we show that this imposes a rather strong condition on A~^ which 
already implies that most matrices A generate non-realizable instances. 

In the second stage, we show how Random-Facet exploits the structure of 
A~^ to arrive at a polynomial bound. 



5.1 Three-dimensional Realizability Restrictions 

An AOF (j) on the d-cube is realizable if and only if there exists a polytope 
P, combinatorially equivalent to the unit d-cube [0, (we say that P is a 
combinatorial d-cube in this case), and a linear objective function a : E'* M, 
such that the orientation generated by a on the graph of P is isomorphic to the 
orientation generated by cj) on Note that we cannot assume P to be equal to 
the unit cube in this definition. For example, any AOF 4> on the 2-cube satisfying 
(?i(0, 0) < </>(!, 0) < </>(!, 1) < 1) generates the orientation of Figure 2 (left), 

and it takes a “deformed” unit square to realize it, see Figure 2 (right). 




Fig. 2. Orientation not realized by unit cube 



A necessary condition for realizability is of course that the directed subgraphs 
induced by 0 on the 3-dimensional faces of are realizable in this sense. The 
following lemma develops a condition for this in case of Matousek’s AOF. 

Lemma 2. Assume the AOF generated by A & GF{2Y^^ is realizable. Then 
for all S C [d], IPj = 3, the submatrix A(P) is not equal to 

/ioo\ 

Ai := 0 1 0 or A 2 := 1 1 0 , 

viiiy voiiy 



Froof. It is easy to check from the definition (1) of Matousek’s AOF that A± 
and A ‘2 generate the C^-orientations depicted in Figure 3. 

In [10] it is shown that these orientations do not come from a linear objective 
function on a combinatorial 3-cube (they are actually the only ones on the 3- 
cube that don’t, up to isomorphism). Another way to see this is due to Holt and 
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Fig. 3. Cube orientations generated by Ai (left) and A 2 (right) 



Klee who have shown — based on [3] — that a necessary condition for realizability 
is the existence of three vertex-disjoint directed paths between source and sink 
[14]. In either of the two orientations in Figure 3, only two such paths can be 
found. 

From the interpretation of Random-Facet as Algorithm 2 RF-Flip it is 
clear that the orientation induced by A on a 3-face {v, S) is isomorphic to the 
orientation induced by A{S) on (7^. Because the former is realizable by assump- 
tion, A(S') must be distinct from Ai and A 2 . □ 

As it turns out, the forbidden submatrix conditions established by the Lemma 
are quite strong and manifest themselves directly in A~^ (which is again lower- 
triangular). The following is the crucial result. 

Theorem 1. If A does not contain submatrices A{S) = Ai or A(5) = A 2 , then 
A~^ has no more than two one-entries per row (including the diagonal one). 

This shows that among the 2 ( 2 ) matrices A, at most dl w generate 

realizable instances. 

Proof. Because A is invertible and lower-triangular, each column A* can be writ- 
ten in the form 

Aj = e* -t- ^ Aj, (3) 

where J(i) C -|- 1, . . . , d} is a unique index set. We now prove two claims. For 
this let A = (fly), 1 < < d, i.e. aij is the element in row i and column j. 

Claim 1. For all i, the columns Aj, j G J(i) are disjoint in the sense that no two 
of them have a one-entry in the same row. 

Proof (of Claim 1): assume on the contrary that ji < 32 < k exists such 
that ji,j 2 € J{i) and = a/t.js = 1- Suppose that (j 2 ,fc) is lexicographically 
smallest with this property. 

Case 1. j 2 < k. Then we have the situation of Figure 4 (left), and because ( 32 , k) 
was lexicographically smallest with = CLhj 2 — have ★ = Uja.ji = 0, 

which gives a forbidden submatrix. 
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Fig. 4. Case j 2 < k (left), j 2 = k (right) 



Case 2. 32 = k. Then the situation is as in Figure 4 (right). Because in rows 
j 6 < ji, position ji must be zero (otherwise we had a lexicographically 

smaller conflict again), it follows that ★ = = 1. For the same reason, in all 

columns j € J{i),j ji,j < J 2 , position k is zero. But this implies # = = 0, 

and we have a forbidden submatrix again 

Claim 2. The index sets J{i),i € [d] are pairwise disjoint, i.e. in representing the 
columns according to (3), every column Aj is used for at most one other column 
Ai. 

Proof (of Claim 2): assume a column Aj is used twice for distinct columns 
Ai^,Ai^,ii < *2 <3- Because Aj is disjoint from all other columns used to 
represent Ai^ resp. Ai^, we get Ujj^ = = 1, see Figure 5 (left). In order not 

to get a forbidden submatrix, we must have ★ = 0 * 2,11 = 1. 





Fig. 5. Aj is used for Ai^ and Ai^ (left), Aji and Aj contribute to Tij (right) 
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Let j' e J(ii) be the column ‘responsible’ for the one-entry We must 

have j' ^ *2 because Aji is disjoint from Aj. Therefore, j < i 2 , and the situation 
is that of Figure 5 (right). 

Because both Aji and Aj contribute to Ai^ , they are disjoint, and * = ajji = 
0 must hold. Then, however, we have a forbidden submatrix once more. 

Now we are prepared to prove the statement of the Theorem, which follows 
if we can show that the columns of M := E + A~^ are disjoint in the sense 
previously dehned, where E is the unit matrix. The f-th colnmn of M is given 
as 

Af j = Gi + A^ ^ = A ^ (Ai + ej) = A ^ ^ ) Aj = y ^ ey . 

j€J(i) j€J(i) 

The disjointness of the J(i),i E [d\ now immediately implies the disjointness of 
the columns M*. □ 

5.2 Random-Facet under Realizability Restrictions 

Theorem 1 implies that A~^ is extremely sparse in the realizable case, and 
this will entail that Random-Facet is fast on the AOF induced by A. Before 
we go into the formal analysis, here is the intuitive argument why this is the 
case, and how the inverse matrix A~^ comes in. Consider starting the algorithm 
on the pair (n,[<i]). With probability 1/d, the hrst recursive call will compute 
v' = opt(n, [d] \ {f}), for some i G [d]. We know from Lemma 1 that Av' = 0 or 
Av' = e*, and only in the latter case, the algorithm performs a second recursive 
call, starting with the vertex v" = v'+ e*. It follows that 

v" = A~^A{v' + ej) = A~^{ei + A*) = A~^ + e*. 

This means, the possible v" coming up in the second recursive call are nothing 
else than the pairwise disjoint columns of the matrix M considered in the proof of 
Theorem 1. Now, if that call hxes some position j G [d] in its first recursive step, 
it fixes a zero position with high probability, because v'J ^ 0 for at most one i. 
This means that already after the hrst recursive call in RANDOM-FACET(n", [d]), 
the problem has been optimally solved in d — 1 out of d cases, because hxing a 
zero means that the optimal vertex 0 lies in the facet that is being considered. 

In the following formal derivation of the 0{dF) bound, it will be more con- 
venient to argue about the algorithm RF-Flip instead. Let T{w,S) be the ex- 
pected number of pivot steps in a call to RF-FLlp(t/), 5), and dehne 

-ujb) := ej -I- Aj, for all i G [d]. 

The following equation is the basis of the analysis. 

Lemma 3. Let |5| = m > 0 and define opt{w,S) to be the value returned by 
RF-Flip(w, S). With -u;bd) ~ opt(u)b), 5" \ {j}) 'we have 

r(tcb), 5) = 1 ^ 5 \ {j}) + (1 + r(tcW, 5))[tcf = 1]) . (4) 

Here, [•] is the indicator variable for the event in brackets. 
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The proof follows immediately from the description of RF-Flip in Algorithm 
2, together with the following little observation: the value w' = computed 
from the first recursive call satisfies w'/. = 0 for all A; G 5 \ {j} by Lemma 1. 
Exactly if Wj = 1, we have a second recursive call with value w" = w' + Aj. 
We do not necessarily have w" = as suggested by the equation, but when 
we restrict both values to S, they agree. As previously observed, we then have 
T(w",S) =T(w<^A^S). 

Define 

T(5) = ^T(tcW,5). 
ies 

Using (4) and the fact that T(w^A^S\ {j}) = T{w^A^S) (the bit tc® is in- 
significant and does not contribute any pivot steps in RF-Flip(u)*^*\ 5'))), we 
obtain 

E \ {•?■}) + (1 + r(u;«, 5)) = 1]V m > 1. 

™ iGS V iGS / 

( 5 ) 

The claim now is that = 1 for at most one i. We have = 0 
is an insignificant bit which therefore never gets flipped in a pivot step) , and for 
« ^ j we can argue as follows. By definition of rpb.i) we know that 

(i) = 0 for all fc 6 S' \ {j}, and 

(ii) = 0, for all As ^ S \ {j}. 

The second condition holds because the vectors corresponding to the two values 
are in the same facet {v,S\{j}). Condition (ii) implies —Wg’^^))j = 

0. Using the definition of rpb) and property (i), one deduces that is equal 
to entry i in row j of A(S)“^. By Theorem 1 (which also applies to A(S)), at 
most one of these entries is nonzero, and this proves the claim. Then, (5) implies 

+ ™> 2 . 

ies 

If we let T(m) max| 5 |=TO^(S) we get T(m) < (m/(m — 2))(T(m — 1) + 1), 
for TO > 2 and T(2) < 1 (by directly inspecting the possible cases), from which 
we obtain 

T(m) < — TO, TO > 2. (6) 

To conclude the analysis we observe that 

T{w, S)<~Y, (t{w, S \ {j}) + 1 + T(«;«, 5)) , 

for all start values w. With (6) and T(m) := max^^ 5 |=TOT(w, 5) we get T(0) = 
0,T(l) = land 

— TO^ , TO > 2 



T(m)<T(m-l) + U-l(3(”) 
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from which 



T{m) < ^ 




+ 2 



follows. This gives the main result of the paper. 



Theorem 2. If the AOF defined by A E via (1) arises from a linear 

program on a combinatorial d-cube, then the expected number of pivot steps in 
Algorithm 1 Random-Facet is bounded by 



3 

2 




+ 2 




for any start vertex v. 



6 Conclusion 

We have shown that the simplex algorithm Random-Facet takes an expected 
0(oP) number of pivot steps on a subclass of Matousek’s AOF, characterized 
by a sparsity condition according to Theorem 1. This subclass contains at least 
all realizable instances, but we do not know whether it contains only realiz- 
able instances. On the other hand, there are (non-realizable) instances in the 
class where the expected number of steps is of the order l7(exp(\/M)/d). This 
means, we have presented the first scenario in which the known combinatorial 
LP frameworks are provably weaker than LP itself. 

Our O(d^) upper bound on the number of pivot steps is tight. Matousek’s 
class also contains an instance equivalent to the d-dimensional Klee-Minty cnbe 
(when all entries of A below the diagonal are one), and the behavior of Random- 
Facet on this polytope can completely be analyzed: for some start vertices, the 
expected number of pivot steps is 0(cF) [20,9]. 

The main open problem is to extend our result to AOF beyond Matousek’s 
class. For example, is Random-Facet in fact polynomial on all realizable AOF 
on the d-cube? Even for this special polytope, nothing is known that goes be- 
yond the snbexponential bound of Section 3. It would also be interesting to 
find a realizable AOF that requires asymptotically more than 0{cP) pivot steps. 
Namely, although there is no reason to believe that 0{d^) is an upper bound 
in the general case, no ideas are cnrrently known that may possibly lead to 
worse bounds. In this respect, the situation is quite similar to that of the algo- 
rithm Random-Edge; here, the best lower bound that is known for AOE on 
the d-cube is J7(d^/logd), even when nonrealizable AOF are admitted [9]. (In- 
cidentally, this bound is obtained for the d-dimensional Klee-Minty cube again). 
Breaking the cF-barrier from below would therefore require substantially new 
classes of examples, for either of the two algorithms. 
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Abstract. We relate the mixing time of Markov chains on a partial 
order with uniqne minimal and maximal elements to the solution of as- 
sociated linear programs. The linear minimization program we construct 
has one variable per state and (the square of) its solution is an upper 
bound on the mixing time. The proof of this theorem uses the coupling 
technique and a generalization of the distance function commonly used in 
this context. Explicit solutions are obtained for the simple Markov chains 
on the hypercube and on the independent sets of a complete bipartite 
graph. 

As an application we define new a Markov chain on the down-sets (ideals) 
of a partial order for which our technique yields a simple proof of rapid 
mixing, provided that in the Hasse-graph of the partial order the number 
of elements at distance at most 2 from any given element is bounded by 
4. This chain is a variation of the Luby-Vigoda chain on independent 
sets, which can also be used directly to sample down-sets, but our result 
applies to a larger class of partial orders. 



1 Introduction 

When confronted with a large set of configurations (a state space), S, about 
which little can be deduced analytically, one might attempt to estimate some 
of its parameters by random sampling, i.e. by Monte Carlo methods. It is well 
known [5] that the ability to sample states at random from a distribution close 
to uniform (or some other weighting of the states w) is usually sufficient to 
estimate, the size of the state space (or its weighted analogue, the partition 
function, Z := J2x€S ■ 

1.1 The Markov chain Monte Carlo method 

However even random sampling from S might be difficult. A general approach to 
overcome this difficulty, is to start from some initial state Xq = x G S and then 
apply some local random transformation / for > 0 steps, computing Xt+i := 
f{Xt), iov 0 < t < N, to obtain a random state Ayr G S. The process {Xt} is 
a discrete time Markov chain MC = (5, P) on S with transition probabilities P 
satisfying P(x, y) = Pr [f(x) = y] for all x,y G S. If this Markov chain is ergodic 
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then the distribution of Xn, will converge to the chain’s stationary 

distribution, say vr, as N grows large, thus providing a sample Xn from vr with 
small bias for sufficiently large N. This method is often called Markov chain 
Monte Carlo (MCMC), and its main difficulty is to define the random function 
/, or alternatively the chain MC, such that it is possible to derive an upper 
bound on the mixing rate of MC (the smallest t so that P*{x, •) is close to tt). 
Such a bound can the be used to fix an integer N that guarantees small bias. 
While considerable progress has been made to bound the mixing rate it remains 
a difficult task [2,11,7]. 

1.2 Overview of technique and results 

Consider a Markov chain MC on a partially ordered set (<S, F, s„;„, s„,ax) that 
has unique minimal and maximal elements and respectively. Let us say 
that MC is monotone if there exists a monotone transition function / for it, (see 
Definitions 2 and 3). A rank function on tfie poset {S, C) is a strictly monotone 
function /i : 5 — > N. In section 3 we define, for a given monotone Markov cfiain 
MC, a set of linear constrains on a rank function h on 5, such that the mixing 
time of the chain is bounded essentially by for any h satisfying the 

constrains (Theorem 2). This means that we can obtain a bound on the mixing 
rate as the solution to the linear minimization program: 

minimize subject to /i G Tfo , 

where T-Lq is the polyhedra defined by the linear constrains on h. Basically we use 
a simple coupling for the Markov chain and a special kind of distance function 
(defined by a rank function) to bound the coupling time and therefore also the 
mixing rate (see Remark 1). 

The large number of linear constrains makes it usually infeasible to solve 
the program automatically. For the simple case of Markov Chains on the n- 
dimensional hypercube and the independent sets of a complete bipartite graph, 
we find an explicit solution (see Example 1 and Section 4.5 respectively). 

In sections 4 and 5 we apply our technique to obtain bounds on the mixing 
rate of Markov chains on the down-sets (or ideals) of a partial order. In each 
cases we prove that the rank function h{x) = |a;| (cardinality of x) satisfies 
the constrains for the respective Markov chain if the degree of the poset is 
appropriately bounded. The first chain is just the random walk on the lattice 
of the down-sets. We show that this chain mixes rapidly for degree-2 posets 
(see Definition 5). The second Markov chain is a variation of the chain in [8] 
on independent sets. For an element u in the poset let d^iu) be the number of 
elements at distance at most 2 from u in the Basse graph ^ of the poset. Our 
chain mixes rapidly if d 2 (u) < 4 for every element u (Theorem 6). As explained 
in Remark 5 Luby and Vigoda’s chain can also be used directly, but our one can 
be shown to mix rapidly for a larger class of posets. 

The Basse graph of the poset {S, □) is the directed graph on S with an edge from 
X to y whenever x \Zy and there is no element z satisfying x z \Zy . 
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2 Preliminaries 

Let MC = (5, P) be an ergodic Markov chain on a finite set of states S. Er- 
godicity implies the existence of a stationary distribution tt on S' that satisfies 
limt^oo P^{x,y) = 7 r(y) for all x,y £ S. The (total) variation distance is usually 
used as metric on distributions p and <7 on S: 

Up -^11 Ip(^) = max{|p(A) -q(A)| : A C S} . 

xGS 

The time an ergodic Markov chains needs to be close to its stationary distribution 
TT is called the mixing rate. For initial state a; G S it is defined by 

Tx(e) min{t > 0 : Vf' > t ||P* (x, •) — 7 r|| < e} 

and in general by 

r(e) ;= max{ra,(e) : a; G S} . 

We can sample elements of S with a distribution e-close in variation distance to 
TT by starting at some arbitrary state of S and simulating transitions of MC for 
t = r(e) steps. To bound the mixing rate one can sometimes use a coupling: the 
construction of a random process {(W,lf)} on pair of states, such that {Xt} 
and {Yt} seen by themselves, obey the same law as the original random process. 

Definition 1 (Coupling). A coupling for a Markov chain MC = (S,P) is a 
Markov chain {(At,!))} on S x S, such that the transition matrix Q of the 
coupling satisfies for all x,y £ S, 

y x' £ S : ^ Q(x,y,x' ,y') — P{x,x') and 

y'£S 

yy'£S: Q{x,y,x',y') = P{y,y') . 

x'E.S 

Any coupling {(Xt, 1))} for a finite and ergodic Markov chain will (with proba- 
bility 1) eventually hit the diagonal Xt = Yt [3, p.274j. The time at which this 
happens is called the coupling time and it is a random variable that depends 
on the initial distribution of the process. Denote by the coupling time of 
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a coupling Q, when started from initial state (Xo,io) = {x,y) and define the 
expected coupling time of Q by 

f = TQ:= max {E [T"’*'] : x,y e S} . 

An upper bound on the mixing rate is given, for example, in [1] 

r(e) <2eTQ(l-lne) , (1) 

valid for all 0 < e < 1 (where In is the natural logarithm, i.e. to base e). 



2.1 Bounding the coupling time 

To bound T [1,7] use the following idea. Assume we have defined a real- valued 
and bounded distance function on S. The random process is real- 

valued and bounded. If in addition one can show that it decreases, in some 
probabilistic sense, then this should give us a bound on 

Remark 1. For a Markov chain on a partial order the derivation of bounds on 
the mixing rate can be simplified, as we explain now. 

1. The coupling on MC can be constructed from any monotone transition func- 
tion for MC, as explained below (this part is based mainly on [9]). 

2. Instead of a distance function we use a rank function h; this induces a 
distance function on comparable states x,y by d?{x,y) — max{h(a:), /i(y)} — 
min{/i(3;), h{y)}. 

3. Finally the decrease of the expected distance after one transition step is 
expressed by linear constrains on h. 

Definition 2 (Transition function). We say that f is a transition function 

for the Markov chain MC = (5, P) if there is a finite set 17 so that f : Sxf] ^ S 
and for each x,y £ S 

IPr [f{x, u))=y]= P{x, y) , (2) 

where u> is drawn from 17 uniformly at random. 

Definition 3 (Monotone Markov chain). A transition function f for MC is 

monotone if for all x,y £ S and for all lo £ il: xiZy ^ f(x, u>) C f(y, ui) . 
A Markov chain MC is called monotone if there exists a monotone transition 
function f for it. 

Sometimes we view / as a random function on S with Pr [f{x) = y] := \{lo £ fl : 
f{x,u)) = y}\/\n\ . 

Remark 2. Since for a given Markov chain, there might monotone and non- 
monotone transition functions, we assume, to avoid ambiguities, that any mono- 
tone Markov chain is specified by a monotone transition function. 
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A theorem in [9] states that for a monotone Markov chain MC, defined on a 
partially ordered set with unique minimal and maximal states, it suffices to 
consider couplings defined by a (monotone) transition function. Let Q* be the 
coupling defined by 



(A^O? fio) • (^minl ^max) 

(Xt+uYt+i) ~ {nXt,u),f{Yt,u)) for t > 0 



( 3 ) 



where uj is taken with uniform probability from J? and / is a monotone transition 
function for MC. 



Theorem 1 (adapted from [9]). Let T* be the coupling-time of the coupling 
Q* for MC defined above and letri^e) be the mixing rate o/MC. Letn G N 6e the 
length of the longest chain in {S, C). Then for 0 < e < 1/e 



1 

8(1 + logi/,n) 



E [T*] < r(e) 



Together with (1) this implies that for this type of Markov chains a coupling can 
be constructed in a generic manner, if one is willing to settle for bounds that 
are sufficient to prove polynomial time bounds, but maybe not optimal. We will 
refer to the coupling defined in (3) as the standard coupling for MC (or more 
accurately, for /). 



3 Main theorem 



For what follows let 5 be a finite set partially ordered by C with unique minimal 
and maximal elements and and fix some ergodic Markov chain MC = 
(5, P) on S. For a function h : S' — > [0, oo) define the functions h, A[h] : S — > R 
by 

h{x) :=^P{x,y)h{y) (4) 

yes 



A[h]{x) := h{x) -h{x) = ^P{x,y){h{y) - h{x)) (5) 

yes 

for all a; G S. The functions h and A[It\ denote the expected value and difference 
of h after one transition respectively and they can be written, in vector notation, 
as h = Ph, and A[h] = — (/ — P)h. ^ 

We express now by linear constrains the conditions on /i to be a rank function 
with the additional property that the expected difference in rank between any 
two states decreases in expectation after one transition. Let LL be the set of all 
functions h : S — > [0, oo) that satisfy for all x Qy E S 



1. X ^y ^ h{y) > h{x) and 

^ The expression I — P is also known as the Laplacian of P 
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2. h{y) — h{x) > h{y) — h{x) or equivalently A[h]{y) < A[h](x) . 

Lemma 1. % satisfies the following closure properties. For any h, h' € %, 

1. h + h' en, 

2. ah for any a > 0, 

3. h-\- cl aT-L for any real constant c, 

where I is the constant function: l{x) = 1 for all x G S. 

Proof. Straightforward and omitted. □ 

The above lemma allows us to normalize P.. Denote xC.-y if x \Z y and for 
no zGSxiZziZy (that is xlZ-y are the edges in the Hasse graph of the poset 
{S, E))- By transitivity 2 inequalities for every pair xC-y are sufficient to express 
the constrains on h. Let "Ho be the set of functions h : S ^ R such that 

HI = 0 , 

H2 for every pair xA-y h{y) — h{x) > 1 , 

H3 for every pair xHy A[h]{y) < A[h]{x) . 

Clearly 'Ho and the closure properties in the last lemma imply that if 'H is 
not empty than neither is 'Hq. 

Theorem 2 (Main theorem). Let MC = (S,P) be a monotone Markov chain 
on a poset with unique minimal and maximal element {S, C, and mono- 

tone transition function f. For h G 'Ho and x G S define the random variable 
d{x) := h{f{x)) — h{y) and set 

V (h) := mm{E [(d(y) — d{x))‘^ \x,y~\ '■ xAy G S} . 

Then the coupling time T* of the standard coupling Q* (defined in 3) satisfies 

E[T*]<h{s^^fi^/V{h) . 

In particular for the solution h* to the linear minimization program 
min h{s„„f) subject to h G'Hq , 
we have E[T*] < h*^lV{h). 

Remark 3. Note that because of condition H2 the convex space 'Ho is bounded 
from below. Conditions HI, H2 together guarantee that any h G 'Ho satisfies 
h{x) > \x\ = height of x in (S', □)• 

For the proof we adapt the following theorem from [7] . 

Lemma 2 ([7]). Let MC he a monotone Markov chain on a pjoset {S, E, Smi«, Smax) 
as above and let {{Xt, Yt)} be a coupling for MC with coupling time T . Let B > f) 
and let ^ : S X S ^ [0, B] be a function on S satisfying F{Xt, Yt) = 0 ijf t = T. 
Set <I>{t) := <L(Yt, Xt), AF(t) := $(t-\- 1) —<I>{t). If for some F > 0 and all values 
oft>0 
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1. E It] < 0 and 

2. m > 0 => E [{Am?\Xt,Yt] > y 

then the expected coupling time satisfies 

Proof. Outline of proof of theorem 2. Let (Xt,Yt) be the standard coupling. 
Since it is defined by a monotone transition function we have, Xt \Z Yt for all 
0 < t < T*. Any fi G is strictly increasing, therefore h{Xt) < h{Yt) and 
equality holds only at f = T*. This means that we can use y) = h(y) — h{x) 
for a; C y in lemma 2. 

A straightforward calculation, using the fact that Q* is a coupling, shows 
that for any x,y £ S 



E [^f{x), f{y))\x, y] = h{y) - h{x) , (6) 

and since h satisfies condition H2, the first hypothesis of the lemma is satisfied. 
The definition of V (h) guarantees the second hypothesis immediately. Finally, 
by HI, <P(0) = B = and the upper bound in the lemma simplifies to 

□ 

Remark f. In practice the quantity V{h) can be lower-bounded without diffi- 
culty. For example we have (because d? is integer valued) 

V > min,,,j^ Pr [<P(f(x), f(y)) 7^ d>(x,y)lx,y]. 

A common situation for which we can derive an explicit bound for V(h) is 
when the transitions of the Markov chain satisfy the following condition. Let 
X J7 — > S' be the transition function for MC 

Condition f. Assume that for all a;,y G S so that x \ly there are a;i,ci;2 G J? so 
that 



xCf{x,(Xx) and /(y,wi)=y 
x = f{x,LJ 2 ) and f(y,uj 2 )Cy 

Theorem 3. Let MC, / and T* be as in theorem 2. If the transition function f 
of MC satisfies condition 4 then the coupAing time T* of the standard coupling 
for MC satisfies 

E[T*] < {h* f ■ , 

where h* is the value obtained from the linear minimization program defined by 
min h{s,^„f) subject to h G TIo ■ 

Proof (Outline) By theorem 2 all that is left to do is to bound V{h). And the 
above condition implies that V{h) > 2/|f?| for any /i G 'Wo- □ 
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In all the examples that follow the state space 5 is a distributive sub-lattice 
of 2^ under set-inclusion, for some finite set U = [n] = {1, . . . n}. Therefore we 
use the simpler notation with set operations, i.e. C, U, 0 for C, and 
respectively. 

Example 1 ( The hypercube ). Maybe the easiest examples for the concepts dehned 
in the previous sections is provided by the hypercube. Let U = [n] and consider 
the Markov chain on the n-dimensional hypercube 2^ defined by 

J? := {(u, -b), (u,-) :u eU} 



and the transition function 



r 



{X,(u,a)) 



X U {u} : if (T = -b 
X \ {u} : if (7 = — . 



It is easy to check that this chain is ergodic with uniform stationary distribution. 
Let h{X) := |X| and ^{X, Y) := h{Y)-h{X) for X,Y £2^ such that X CY.$ 
is just the Hamming-distance restricted to pairs X CY. The function h satisfies 
clearly conditions HI and H2. To verify condition H3 note that for X £2^ 



A[h]{X) = ^{n-2\X\), 

which is a decreasing function on (2^, C). 

Condition 4 is easily verihed and we can apply theorem 3 to obtain a bound 
on the expected coupling time of the standard coupling 

E[T*] < h({7)2y =n^ . 



It can be shown that T* is concentrated around 9{n\ogn) {[1]) so our bound 
is quite bad. However in less symmetrical cases the theorem provides bounds that 
are comparable (and sometimes better) to those provided by other methods (if 
other methods work at all). 



4 Down-sets of a partial order 

In this and the following section we study Markov chains on the set I of down- 
sets (or ideals) of a poset {U, :<). ^ X is a distributive lattice and by a theorem 
of Birkhoff, we can identify it with a distributive sub-lattice of (2^,C). Note 
that the Hasse-graph of X (seen as a poset) is a subgraph of the |17|-dimensional 
hypercube and neighbouring down-sets are those which differ in exactly one 
element. All of our results involve (various forms of) the degree of the poset. 

^ The symbol A is used here to avoid confusion with the partial order C or C on the 
states of the Markov chains. 
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Definition 5 (Degree(s) of a poset). Let (U,:<) be a poset and denote by 
r~^(u) = {v £ U : u~<-v} the elements covering u and by r~(u) = {v £ U : 
v^-u} the ones covered by u. The degree of an element u £U is defined by 

d{u) :=|r+(«)| + |r-(tt)| 

By the degree of a poset we mean the maximal degree of its elements: max{d{u) : 
u £ U} . More generally we let dfiu) denote the number of elements v such 
that there is a directed path from u to v with i edges in the Hasse-graph. So 
d\{u) = d{u) and d^,{u) denotes the number of elements comparable to u. We 
shall only use d, d 2 and d*. 



4.1 The simple Markov chain on down-sets 

For a down-set X £ I lei X~ be the set of maximal elements in X and the 
set of minimal elements m.U — X. It can be seen that 



— liu ^ X then X i±i {u} is a down-set iff n G and 

— if u £ X then X — {n} is a down-set iff n G X~. 



The graph on I in which a down-set X is connected with the down-sets X 0 
{u},u £ X~ U is, as can easily be verified, connected (it is the Hasse graph 
of the poset (I, C)). 

The simplest Markov chain on the down-sets X of a poset {U, A) is just the 
random walk on this Hasse graph (with added self-loops to guarantee aperiod- 
icity). While this chain mixes rapidly only for degree-2 posets, it serves us to 
illustrate the methods developed. We define a Markov chain MCq = (X, P) by 
the transition function / : X x 17 — )■ X where 17 := U x {0,—} and for X £ I, 
{u, a) G 17 

f(X, (u, a)) := 7f 0 {u} if u £ X'^ else X . (7) 

This implies the transition probabilities 



P{X,Y) 



l/2n 


: |X0T| = 1 


0 


: X0T > 1 


. |x+|+|y-| 

^ 2n 


:X = Y . 



It is not difficult to check that the chain is ergodic with uniform stationary 
distribution. 



4.2 The ‘Barrier’ functions - a tool 

Before we continue let us define the following set- valued functions. For a down- 
set X G X and i G N define 



(X) := {u £ X :\{v £ X : u ^ v}\= i} 
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and dually 



b+{X) :={ueu-x :\{veu-x :v^u}\=i} . 

Note that X+ = 6o If G ^t(^) there are 

i elements that have to be added to X before we can add u. This i elements 
constitute a kind of barrier iox the movement of the Markov chain in the direction 
of the coordinate u. Similarly if u G b~{X) then there are i elements to be 
removed from X before we are allowed to remove u. For another interpretation 
define for X G I and A; G N 

B+{X,0) = X 

B+(X,k) :=X W b^{X) W ••• W 6^_i(X) 



and dually 



B“(X,0) :=U-X 

B-{X,k) := iU-X) W 6o(X) W W bl_^{X) . 

Lemma 3. Let {U, A) be a poset with down-sets I. For i G N let Bf,B~ be as 
defined above. Let X,Y Gl so that X CY. Then for all k GN 

B-+{X,k)CB+(Y,k) and B~{Y,k) ^ B~{X,k) . 

Proof. Observe that for any down-set I gT and A; G N'*' the set B~^{I, k) includes 
all elements u that are in I or that can be added to I by adding before u at 
most A; — 1 other elements to I. This implies that B+(I, k) is monotone growing 
inland B+{X,k) C B+{Y,k). 

The inclusion for B~ is just the dual of the one for B+ and can be obtained 
from it by ‘inverting’ I. □ 

4.3 Coupling time of MCq 

Before we can apply theorem 3 we have to check that / is monotone. 

Lemma 4. Let X,Y G 1 and LI as defined above. For every uj G f2, if X CY 
then f(X,uj) C f(Y,uj). Therefore f is a monotone transition function and MCq 
a monotone Markov ehain. 

Proof. Let X,Y G I such that X CY. Consider first an ‘adding’ move uj = 
(u, +) for some u GU. This move will change X and add u to it iff u G X+, i.e., 
if all u ^ tt are included in X, and therefore also in Y since XCT. If ugX+ 
then u G y W T+ (by lemma 3 with A: = 1). So if we can add tt to X then either 
u GY or we can add u also to Y. 

The analogous argument works for a ‘subtracting’ move since Y~ C {U — 
X)l±)X“. □ 
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4.4 Rapid mixing for degree-2 posets 

Using the rank function h : X ^ \X\ we show now that for degree-2 posets the 
Markov chain MCq defined above mixes rapidly. 

Lemma 5. Let /i : I — > R 6e defined by h{X) := |X| for all X £ X. Let X,Y ^ X 
with X CY, and XI as defined above. Then 

A[h]iX) = -^^{\X+\-\X-\) . 

Proof. For this specific function h we can easily evaluate A[h\: 
\X2\-A[h]{X)=Y,{h{f{X,u^))-h{X)) 

LO G ^ 

= ^ (|x w Ml - |x I) + ^ (|x - Ml - 1^1) 

LiGX+ -uG^" 

= - l^-| • 

□ 

Lemma 6. Let h{X) := |X| for all X ^ X and let X,Y ^X with Y = X l±l {tc} 
for some w G Then 

A[h]{Y)-A[h]{X)<^^j^ , 

where d{w) is the degree of w in the poset (U,:<). 

Proof. Let us look at the difference between X+ and U+. On the one hand 
w G \ y+ and this is the only element in \ U+. On the other hand 
V G Y~^ \ iff u G y*" — (X 1+) {tc})''' and w is the only element outside X so 
that w-<-v , i.e., iff 

V G r+(w;) n6:j^(A:) . 

This gives 

|y+l_|x+| = -i + |r+Mn6+(x)l . (8) 

The difference between X~ and Y~ can be expressed similarly. The only element 
in Y~ \ X~ is w. An element v G X~ \ Y~ iff u G X~ and v -< w, therefore 

|x-|-|y-| = -i + |r-Hn6oWI • (9) 

Apply lemma 5 and add equation 8 with equation 9 to derive: 

\f2\-iAMY)-A[h]{X)) = 

= {\Y+\-\Y-\)-{\X+\-\X-\) 

= \Y+\-\X+\ + \X-\-\Y-\ 

= _2 + |r+M n b+{X)\ + \r-{w) n bo(x)\ 

< -2 + |r+M| + |r-(u;)| 

= — 2 -f d{w) . 

□ 
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If the degree of the poset is bounded by 2 then the above lemma implies 
that A[h\ is a decreasing function on the poset (I, C). This is equivalent to the 
condition H3 for MCq. The result of this section is summarized in the following 
theorem. 

Theorem 4. If the degree of {U, A) is bounded by 2 then the coupling time T* 
of the standard coupling of MCo satisfies E[T*] < , where n = \U\. 

Proof The Markov chain MCq is monotone (lemma 4) and satisfies condition 4 
(we have omitted the proof of this) . Therefore we can apply theorem 3 if h is in 
the set Po for MC. It is easy to check that the function h : X ^ |df| satisfies 
the conditions HI and H2 for MCq. The last condition H3 is equivalent to the 
function A[h] being a decreasing function on I which is verified by lemma 6 for 
the case of a degree-2 poset. By theorem 3 

E [T*] < . 



□ 



4.5 as partial order 

The complete bipartite graph on 2m elements {U = V ^W,V x W) can be seen 
as a partial order with V the minimal elements and II'^ the maximal elements. 
As we shall see, the solution for the conditions Po is very far from the simple 
function h used in the previous section. The Basse graph G of the down-sets of 
this partial order consists of the hypercube 2^ and the ’translated’ hypercube 
{C 1+) B : C G 2^}, connected at the down-set V. This graph has an extreme 
bottleneck at V: take S' := 2^ — {V}. There are only m edges out of S' and 
therefore G’s expansion is (at most) m/\S'\ = m/(2’” — 1). 

Theorem 5. There is a function h E Pq for the Markov chain MCq on the 
down-sets of the partial order, defined by the complete bipartite graph on 2m 
elements, that satisfies 



m—l — k 



h(U) = m-\-2 ^ 






k - ■ ■{k-\-i) 
k) • ■ • (m — k — i) 



0(2™) 



where k = [m/2] . 

This gives a bound on the coupling time of 0(2™) , which is of the correct 
magnitude. 

Proof Since X is completely symmetric we look for a solution h{X) that depends 
only on \X\. For i £ {0, . . . 2m} write /i* and p respectively for the value of h 
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and ft on a down-set X with i elements. Then, using the fact that J consists of 
2 hypercubes joined at level |y| = m we derive for 0 < * < 2m 



{ (m - i){hi+i - hi) - i{hi - hi-i) 
m(fti+i - h^) - m{hi - fti_i) 

(2m - i)(fti+i - hi) -{i- m){hi - hi-\) 

{ (m — i)6i — i6i-\ — l 

m6m — m6m-i :i = m 

(2m — i)6i — (i — m)6i-i :m + 1 <i 

where 5i := ft^+i — hi for z G {0, . . .,2m — 1}. Set k := [m/2]. The solution we 
give satishes 

{ m — 2i :i <k — 1 
0 : k <i < 2m — k 

3m — 2i : 2m — k < i . 

For X e J with |X| = i we have A[h]{X) = h{X) — h{X) = hi - hi which 
decreases with i. Therefore A[h\ is a decreasing function on the lattice and 
satisfies condition H3. 

Solving ’backwards’ we find that Si = 1 for i G {0, . . . , ft — 1} U {2m — 
ft, ... , 2m — 1}. For the range i G {ft, . . . , m — 1} we have the recursion Si = 
■i^^^Si-i . The next m — k values, for i G {m, . . . 2m — ft — 1}, are the same, but 
in reverse order, until S has decreased back to 1. Note that for all 0 < i < 2m — 1 
we have ft* > 1 so that the solution satisfies condition H2. Finally define ho 0 
to satisfy condition HI. 

For the 0(2™) bound it suffices to consider the last term in the sum, since 
it is the largest 



ft • • • (m — 1) _ (m — 1)! 

(m — ft) • • • 1 (m-ft)!(ft — 1)! 




□ 



5 A variation on Luby and Vigoda’s Markov chain 

Let G be an undirected graph with maximal degree A. In [8] Luby and Vigoda 
define a Markov chain on the independent sets of G. This chain generates an 
independent set A with probability proportional to for A > 0 and mixes 
rapidly if A < 1/(Z\ — 3) .In the uniform case (A = 1) this chain mixes rapidly 
when the degree of G is bounded by 4. 

Remark 5. For a poset {U, A) consider its comparability graph, which has vertex- 
set U and an edge for any pair u, v such that either u ~< v oi v -< u. There is 
an easy to see bijection between the down-sets of A and the independent sets 
of the comparability graph: just observe that the set of maximal elements of X, 
X~, defines X unambiguously and that X~ is an antichain for A and therefore 




110 



A. Sharell 



an independent set in the comparability graph. This bijection allows us to apply 
Luby and Vigoda’s chain to sample down-sets efhciently, provided the degree 
(d*) of the comparability graph is bounded by 4. 

We can slightly improve on this by using a chain with less transitions: the 
transitions of the Luby-Vigoda chain are based on choosing a pair of comparable 
elements u -< v, whereas our chain chooses only pairs of elements at distant at 
most 2 from each other; as we shall see the resulting chain mixes rapidly when 
d 2 (u) < 4 for all u G U. For posets with 4 or more levels, d ,2 might still be 
bounded by 4, whereas the degree of the comparability graph is necessarily 
greater than 4 (see figure 1). 




Fig. 1. A poset demonstrating the difference between d* and d 2 - In this poset dt(j) = 5, 
but for the more local degree d 2 we have d 2 {j) = 4 



5.1 Definition of the Markov chain 

Let {U, :^) be a poset with down-sets I and define the following binary relation 
on U: 

u ^2 V iff u^-v oi 3w GU : u^-w^-v 

The random choice is to select a pair u -< 2 V and then to add or remove one or 
both of u, V. To be precise define 



17 := {(u, v), {u, v), (u, v) :u -<2 v} 
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and the auxiliary function f':IxQ^2^ 

f(X,{u,v)) :=XU {u,u} 
f(X, (u,v)) :=XU {n} \ {u} 
f{X,{u,v)) :=X\ {u,u} . 



The possibility to remove u and add v is omitted because it can never result 
in a down-set. The transition function and probabilities of the Markov chain 
MCi := {I,P) are dehned by 




f(X,co) 

X 



: if f{X,Lo) el 
: otherwise . 



( 10 ) 



and 

P{X,Y) :=Wra[f(X,Lo)=Y] . 

This definition is only interesting if the poset is not empty, i.e., there are u,v EU 
such that u -<2 v. However we can assume w.l.o.g. that the Hasse graph {U, -<•) 
is (weakly) connected; otherwise we can treat each component by itself. It is not 
difficult to check that this chain is ergodic with uniform stationary distribution. 
We omit this. 



5.2 Bounding the coupling time 

Define now the function /i : I — t- R by 

h{X) := |df| (11) 

for all X G I. We use theorem 3 to bound the coupling time of MCi. To be 
specific there are three tasks: (a) check that the transition function / of MCi 
is monotone; (b) verify that MCi satisfies condition 4 and (c) show that A[h\ 
is a decreasing function on (I, C) (condition H3). This suffices since h satisfies 
conditions HI and H2 independently of the Markov chain. Only the last of this 
three calculations is of interest here and we leave the other two for the reader. 
First we derive an expression for A[h]. 

Lemma 7. Let {U, ^) be a poset and let h, and 17 be as defined above. For every 
down-set X of A 

\n\A[h]{X) = d2{u) +2\bt(X)\- d2{u) -2\bf{X)\ (12) 

uex+ uex- 

Proof. Fix u e For every v ~<2 u the move (v,u) contributes 1/|17| since 
necessarily v is already in X. Similarly for every w so that u ^2 w the move 
{u,w) contributes 1/|17|, since w ^ X. It should be clear that these are the only 
moves that contribute 1/|17| and therefore each u G contributes d 2 {u)/\L}\. 
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The only possibility for a move (u,v) to contribute 2/|J7| is when u G 
and n G (X l±l {u})+. Note that 

because v G bf{X) means that there is an unique element, i.e., u, outside X 
so that u ^ V. Moreover this element u is in X+, else there would be another 
element w ^ X so that w^-u^-v, which contradicts v G bi(X). Therefore each 
V G bi{X) contributes exactly 2/| J?|. The negative contributions can be similarly 
justified. □ 

Lemma 8. Let {U,I) be a poset and let 17 be as defined above. Let h be defined 
by equation 11. Let X,Y G such that Y = X \±) {re} for some w G If 
d 2 (u) < 4 for all u GU then 



A[h]{Y) < A[h]{X) . 

Proof. By (12) the only elements that contribute to A[h]{X) and A[h]{Y) are 
those in bf(X), b~(X), bf(Y), and b~(Y) for i = 0,1 (remember that b'^(X) = 
X+ori(i6o (X) = X“). Therefore the only elements that may contribute a non- 
zero quantity to the difference A[h](Y) — A[h\{X) are those that are not in the 
same bf or b~- set for Y and for X. Since the only difference between Y and X 
is the element w all those elements are comparable to re — as we shall see they 
have to be at distance at most 2 from w. Let 

K:=\fl\-{A[h]{Y)-A[h]{X)) . 

For f > 0 an element u G ^^'(X) f > 0 moves to &)Li(F) iS w ~<u. To calculate 
the contributions define for f > 0 

Df := bf(X) n G {7 : u) ^ . 

Case i = 2 : An element u in Df moves from bfiX) to bf{Y) and therefore, 

by 12, contributes 2 to K. 

Case i = 1 : u G Df moves from bf (X) to bf (Y) and contributes daiu) — 2 to 
K (2 to the X part and d 2 {u) to the Y part). 

Case i = 0 : observe that the only element that moves from X+ to Y~ is w 

itself. Inspecting expression 12 we see that this contributes —2d2(w) to K. 

Now set, for i>0, 



D- := b- (X) n {u £ U : u w} . 

and proceed similarly: an element u G Dfi moves from X~ = (^) to 

and contributes accordingly d 2 {u) — 2 to X. An element u G Df moves from bf 
to bf(Y) and contributes 2 to K. Putting all this together gives 

K = 2\Df\+ y: {d2{u) - 2) - 2d2{w) + ^ {d2{u) - 2) + 2|D^ | . 

uEDq 
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By assumption d 2 {u) < 4 for any u e U therefore we can bound all {d 2 {u) — 2) 
terms above by 2. This gives 

K < 2 {\D+\ + \D+\ + \D^\ + Pri - d2{w)) . (13) 

The barrier-sets bf{X) and b~ partition U and are all disjoint. Therefore the 
sets Df, D~ are also disjoint. Moreover it is not difficult to see that all elements 
in and in Dq are exactly at distant 1 from w. Finally observe that the 
elements in and are at most at distant 2 from w. To see this assume by 
contradictions that u G (for example) is at distance > 2 from w. This means 
that there are v, v' so that w ^ v < v' < u. But w, v and v' are all outside X, 
which contradicts u G b^iX). 

The four sets contributing positively in 13 are disjoint subsets of the elements 
counted in d 2 {w) and therefore A' < 0 as desired. □ 

It remains only to collect everything into 

Theorem 6. Let {U, ■<) be a poset on n := |17| elements. If for all u G U we 
have d 2 {u) < 4 then the coupling time T* of the standard coupling for MCi 
satisfies 

E [T*] < 6n^ , 

and therefore MCi is rapidly mixing. 

Proof. (Outline) It can be verified that the transition function / is indeed mono- 
tone and that it satisfies condition 4. The function h : X \X\ satisfies always 
the conditions HI and H2 and by lemma 8, given the hypothesis of the theorem 
on d 2 , also the condition H3 for MCi. Theorem 3 gives the bound 

E[T*]</i(f7)2M=n2M<n2^=6n3 , 

since |J?| < 3 • 4n, because d 2 {u) < 4 for all u G U. □ 

6 Conclusions and further research 

We have shown how to construct a simple but large linear program which solution 
provides an upper bound on the mixing rate of a Markov chain. While the size of 
the program makes direct solution infeasible, we hope that further research might 
yield insights into the associated polyhedra and therefore also the behavior of the 
Markov chain. Since the underlying graph of most interesting Markov chains is 
small, so are the number of variables occurring in each constraint and therefore 
our system is sparse in this sense. 

Another reason why a brute force attack on the linear program we define 
is not interesting, is given by the fact that for the Markov chains we study 
a Monte Carlo experiment allows one to estimate the coupling time T*: simply 
simulate a coupling from the minimal and maximal state until they meet (see [9]). 
Repeating this one can obtain an estimate of the expected coupling time with 
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high confidence (say 99.999%). The time needed is polynomial in the expected 
coupling time itself. Therefore if one has reason to hope that the chain mixes 
(more or less) rapidly, then this approach is clearly better than to solving an 
exponentially large system of inequalities. However, if the chain does not mix 
rapidly enough, this approach becomes much less promising. 

As we have seen solutions to the linear constrains can be derived directly 
for some simple examples and this solutions seem to have the correct order 
of magnitude. For the chains on down-sets one can not expect much better 
bounds and for the chain on the independent sets (or equivalently down-sets) 
of A'™’™ the author has verified by numerical methods that the solution given 
is optimal (at least for m up to 16). The recent paper [4] gives some variants 
of Markov chains on independent sets which are analyzed using path-coupling. 
Their results improve on those in [8] mentioned in the preceding sections, mainly 
for non-uniform sampling of independent sets (in the uniform case the degree 
needs still to be bounded by 4). Interestingly they show that the insert-delete 
chain on independent sets mixes rapidly not only when the degree is bounded 
by 2 (similar to our result on MCq in Theorem 5) but even when the degree is 
bounded by 4 (for the exact statement see [4, Thm. 5.5]). The proof is based 
on comparing the transition probabilities of the insert-delete chain with those of 
a more complicated chain on independent sets that mixes rapidly for degree-4 
graphs. It seems possible that this method can be applied to show that the chain 
MCo mixes rapidly for the same posets as the chain MCi. The author is currently 
investigating this conjecture. 

The most interesting questions, however, are open: 

Is the set of linear constrains Tio always feasible ? The only contribution to 
this we can offer at the moment, is that if we restrict ourselves to solutions h 
that depend only on |X|, as in most of the examples studied, then the answer is 
negative. There are bipartite graphs for which there is no such solution for the 
simple chain MCq on the independent sets of the graph. 

Even more important is the question if rapid mixing of a chain is sufficient 
to guarantee a ’small’ solution h. A positive answer would make the connection 
between the mixing rate and the polyhedra 1-Lo much stronger. It might also 
open an approach to prove that a chain is not rapidly mixing. 

We remark that while many interesting and important examples for which our 
technique applies, are actually distributive lattices, we do not use this additional 
structure. At the moment, all that is used, is the partial order relation and the 
uniqueness of the minimum and maximum. It would be very interesting if one 
could use the full structure of distributive lattices to obtain stronger results. 

Finally we mention the paper by [6], which shows that the spectral gap of a 
Markov chain can be approximated (to a factor polynomial in the dimension) by 
semi-definite programming. Clearly this is a much stronger result, since it applies 
to general Markov chains and does not provide only an upper bound, but an 
approximation. However our approach is much simpler and there is reasonably 




A Note on Bounding the Mixing Time by Linear Programming 



115 



more hope to understand the polyhedra 'Ho, then the structure of semi-definite 
programs. 
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Abstract. A random method for exploring a continuous unknown pla- 
nar domain with almost no sensors is described. The expected cover time 
is shown to be proportional to the electrical resistance of the domain, 
thus extending an existing result for graphs [11]. An upper bound on 
the variance is also shown, and some open questions are suggested for 
further research. 
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1 Introduction 

Exploring unknown terrain is an important issue in robotics. The problem has 
been intensively investigated, and several deterministic methods have been sng- 
gested and implemented. Most of those methods, however, rely on sophisticated, 
expensive and fragile systems of sensors (e.g. odometers, infra-red sensors, ultra- 
sonnd radar or GPS), and/or sophisticated mapping algorithms. In this paper 
we suggest a minimalist approach in order to achieve the goal of covering with a 
minimnm of sensing and computing, even if some performance reduction is im- 
plied. We show that on the average, a random walk is not too bad compared to 
deterministic algorithms that use much more sensing and computing to calcnlate 
their steps. 

Formally, the on-line covering problem is to find a local rnle of motion that 
will canse the robot to follow a space-covering curve, such that every point of the 
given region is in some prespecified r-neighborhood of the robot’s trail, r being 
the covering radius of the robot. Such a rule, if obeyed for a sufficient number of 
steps, should lead the robot to follow a covering path which is a polygonal curve 
defined by the points z\, Z2, ■ ■ . , zt , that covers a region i?, i.e., 

T 

R={jBr{zt), (1) 

t=0 
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where Br(z) is a disk of radius r around z, and for all i, — Zi\ < r ^ . Note 
that the shape of R is not known in advance. 

Existing methods for graph search (e.g. BFS, DFS) cannot be used for our 
purpose since no vertices or edges exist in our setting; a robot can move to ar- 
bitrary points on the continuum, while the BFS and DFS algorithms assume a 
discrete and finite set of possible locations. Also, those algorithms need a mem- 
ory the size of which is, in general, proportional to the area to be explored. Yet 
another drawback of fully deterministic algorithms is their inability to provide a 
complete answer for realistic robotic problems, since both sensors and effectors 
are extremely vulnerable to noise and failures. As opposed to some purely com- 
putational problems, in robotics the environment of the robot is not known in 
advance and even if it is - it may change during operation. One way to tackle these 
problems is to make the robot itself non-deterministic by introducing random- 
ness into its behaviour. This motivates our algorithm for the covering problem. 
We call this method PC - Probabilistic Covering. The basic rule of behavior here 
is to make a short step and then a random turn. Somewhat surprisingly, the 
expected performance of the PC approach is not so bad; for example, it covers 
a gridded rectilinear polygon in average time 0{nplogn), where n is the area 
of the polygon and p - its “electrical resistance,” to be defined and explained 
below. 

Some related work has already been done in various areas; 

— Robotic covering: In previous work ([14], [15]) a discrete problem of graph- 
exploration was solved using markers. More recently, the problem of covering 
a tiled floor was addressed in two different ways: In [29] the dirt on the floor 
served as memory to help the robot’s navigation, while in [31] and [30] a 
vanishing trace was used for that purpose. In [6] the issue of inter-robot 
communication is addressed in the context of various missions, among them 
grazing - i.e. visiting every point of a region for purposes of object-fetching. 
There, a reactive model of behavior is presented, and simulation shows that 
detailed communication does not contribute too much to the performance. 
In [5] many experimental works are presented for planetary exploration by 
autonmous robots. Heuristic navigation methods are given in [17] for path 
planning of an autonomous mobile cleaning robot, and in [20] for a robot 
exploration and mapping strategy. However no rigorous analysis is given 
in the above references. In [18] an algorithm is presented for exploration 
of an undersea terrain, using exact location sensors and internal mapping. 
Practical implementations of covering algorithms have been demonstrated 
in [32] and [27]. In [32] a set of robots is described that help clean a railway 
station, using magnetic lines on the floor as guidelines. This method seems 
to work well, but is limited to pre-mapped regions. In [27] a cooperation of a 
team of robots is created by an explicit level of inter-robot communication. 
Each robot can choose one of multiple possible behaviors, according to its 

® Note that if r ^ 0, a covering path tends to be a space filling curve [28], which is 
a continuous 1-dimensional curve that fills a 2-dimensional domain. 
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specific conditions. In one of these behaviors the robot plays the role of a 
janitorial service man, by cleaning the dust around it. 

Randomization and uncertainty in robotic tasks: Uncertainty is an 
inherent factor in any real-life action, in particular one that relies on the 
information gained from sensors and manipulations performed by actuators. 
One way to cope with uncertainty is randomizaiion - introducing a random 
selection into the robot’s control. In [16] and [22], randomization is used to 
(partially) overcome uncertainty in various robotic tasks. In a sense, our PC 
algorithm is an extreme case of randomization, whereas almost no sensors 
are used. 

— Random Walk and Covering: The analogy between random walks on 
graphs and the resistance of electrical networks was presented in [25], and 
later in [13], where it was used for investigating the recuerrence properties of 
random walks on 1, 2 and 3 dimensional grids. The rate of coverage of graphs 
by a random walk has been studied intensively. Two representative results 
in this context are the upper bounds of 0{mn) on the cover time of a graph 
with m edges and n vertices [2], and 0{mp\ogn) where p is the resistance of 
the graph, assuming all edges to be 1-Ohm resistors [11]. In [10] it was shown 
that several random- walkers, if properly distributed in the graph, can bring 
a significant speed-up to the process of covering. Coverage of continuous 
domains by a Brownian motion process was less investigated. A significant 
contribution was made in [24], where a simple relation was derived between 
the cover time and the hitting time in a strong Markov process. The current 
paper aims to make a further progress in this direction, by relating the cover 
time of a Markov process (with discrete time and continuous location) to 
the electrical resistance of the explored region. 

— Off-line covering: The off-line version of the problem (i.e. finding the 

shortest covering path for a given polygon) is NP-hard. The proof, as well as 
approximation algorithms for it are presented in [1]. The related (NP-hard) 
problem of optimal watchman route is to find the shortest path in a polygon 
such that every point of the polygon is visible from a point of the path. This 
problem is investigated in [12]. The goal there is to design a minimum- length 
path that will see each and every point in a given (i.e. known in advance) 
polygon. 

The rest of the paper is organized as follows. In Section 2 we show a lower 
bound on the length of any covering path. Then in Section 3 we describe the PC 
process and show that the expected cover time and its variance can be expressed 
in terms of the electrical resistance of the shape to be covered. In section 4 
we apply our results to prove the existence of a universal traversal sequence of 
angles, and then conclude with a discussion and some open questions. 

2 A Universal Lower Bound on the Cover Time 

We shall now show a lower bound on the length of any covering path, independent 
of the algorithm used to generate it. 
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Lemma 1. The number of points in a covering sequence of r-circles, say Z = 
z\, Z' 2 , ■ ■ ■ , zt^, such that \zi+\ — Zi \ < r, is bounded from below 



Tc > 



67T 

47T + 3V3 



( A/ a) - 1 



(2) 



where A is the region’s area and a = irr^ - the area covered by the robot in a 
single step. 

Proof: In each step (except, perhaps, the first one) the robot jnmps at most 
a distance of r, and hence (due to overlapping) adds at most (a/3/2 + 27r/3)r^ 
to the covered area. Thus, after T points, the covered area is at most St = 
(T — l)(V3/2 + 27r/3)r^ + wr^. By equating St^ to A the lemma is implied. □ 
Remark: It is intnitively reasonable to assume that as r decreases, the “quality 
of covering” improves, i.e. the amount of overlap reduces. This intuition is made 
clear by the following result from [21]. Define N(r) as the minimum number of 
r-circles needed to cover a region of area A. Then 

lirn A^(r) = (27r/V^)(^/a), (3) 



and the minimum is attained in the “honeycomb” (hexagonal) arrangement of 
the circles, obtained by tiling the plane with congruent regular hexagons and 
circumscribing each hexagon with a circle. Note that the above result from [21] 
implies that, asymptotically, the cover time Tc cannot go below 1.209 . . .x (A/a), 
while Lemma 1 implies that for any value of r, Tc > 1.06 . . . x (A/a). 

In the rest of the paper we shall confine ourself to the problem of covering 
a unit-grid polygon of size n, i.e. a polygon made of a connected set of n unit 
squares on the grid. Two squares are considered connected if they have a common 
edge. We shall also assume that the covering radius of the robot is a/ 2; thus we 
have that A = n and a = 2 tt and it follows from Lemma 1 that 

Corollary 2. If R is a unit-grid polygon of size n, then at least steps 

of a \/2-radius robot are necessary to eover it. 

The off-line version of the covering path problem (i.e. when the shape of 
R is given in advance) is known to be NP-hard, and there are various heuris- 
tics to solve it [1]. However in many practical situations, the on-line problem is 
more relevant, since an efficient on-line solution enables an autonomous robot 
to cover a region without the need to be pre-programmed with a detailed map, 
thus being able to serve different shapes with the same hardware. Other advan- 
tages of the on-line approach are the ability to tolerate changes in the geometry 
and topolgy of the environment, and the flexible mode of cooperation that can 
only be achieved via on-line approach, while the pre-programming one is severly 
limited in this respect. This, in addition to the high cost of implementing a re- 
liable system of sensors (which is needed for deterministic covering algorithms) 
motivates our probabilistic approach to the covering problem. 
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3 PC (Probabilistic Covering) - A Randomized Approach 
to the Covering Problem 

In this section we consider a robot that acts with (almost) no sensory inputs; 
it makes a step, chooses a random new direction, and then makes another step. 
Clearly, the average performance of this method is not the optimal one, but it 
has the advantage of being almost sensorless, thus it is cheap and tolerant. In 
fact the only sensing is required for knowing how far are we from the boundary. 

In the sequel we shall refer to the r-disk around z by Br(z), and to the r-circle 
around z by Cr(z). Formally, the rule of motion is defined as follows: 



/* PC - Probabilistic Covering with an r-disk */ 
Rule PC(z: current location) 

A) cover Br{z)\ 

B) set p(z) = min{r,max(B^^,(^)cjj){C}} ; 

/* fi{z) is half the maximum radius */ 

/* (not exceeding r) */ 

/* of a circle around 2 within R */ 

C) choose a random neighbor w from C^(3)(z); 

D) go to w, 
end PC. 



See Figures 1, 2 and 3 for examples of the process Note that if Cri^z) 
intersects the boundary of R, then the duration of a PC step shall be shorter than 
one unit of time, since the step length is //(z) < r. In each step the robot scans 
around to see if a boundary exists within distance r; hence we shall assume that 
the time spent at z is proportional to (p(z))^, where p(z) is half the maximum 
radius not exceeding r of a circle around z within R. The reason for making the 
step length half the possible maximum is to avoid the chance of the robot going 
to dR, where it will get stuck forever since fi(z) vanishes on the boundary. 

We model the robot as a point that covers a circle of radius r around itself. 
Due to the random nature of PC, no deterministic bound can be stated on the 
cover time; we shall, however, draw some bounds on the expected cover time and 
its variance, and both will be given as functions of the electrical resistance of 

^ A JAVA simulator of the PC process is web- accessible through: 
http://www.cs.technion.ac.il/' wagner/puh/mac.html 





Robotic Exploration, Brownian Motion and Electrical Resistance 



121 




Fig. 1. A lonesome PC robot; grey 
area has not yet been covered. 



Fig. 2. Four PC robots working to- 
gether. A fellow robot is considered 
as an obstacle, hence no collisions 
should occur according to the PC 
rule. 



a conductive material iii the shape of R. This resistance can be further related 
to the geometrical properties of the robot and the region. More specihcally, we 
prove the following: 

1. Expected time of complete coverage : E i the expected time until 

full coverage of i? - a unit-grid polygon of size by a PC robot which covers 
a radius of y/2, is bounded by 

2np < E < 2nplogn, (4) 

where p is the electrical resistance of R (assuming a material of unit sheet- 
resistance, to be defined in the sequel). Note that the resistance p = p{R) can 
sometimes be bounded in terms of the geometrical properties of the shape, 
and can always be numerically approximated. E.g. if i? is a ^/n x y/n square 
then its resistance is O(logn), when measured between a bottom left and a 
top right squares. In case of an a x 6 rectangle with a « b, p = 0(b/a). 
Recall from Corollary 2 that any covering path should have at least [n/v^] 
steps. 

2. Variance in the cover time: V ; the variance in time of complete 

coverage, is bounded from above: 

V < 2“n/), (5) 

which yields an upper bound on the standard deviation of the cover time: 

(j [tPC] < 32^/^. (6) 



'□□I 
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Our results can be extended to more general shapes, but this involves various 
types of cumbersome details that will be ommitedin this extended abstract. Note 
that The above results are achieved without using any sensors except collision 
detectors, (the robot cannot distinguish ’’tiles” or ’’grid squares”) and thus have 
almost no vulnerability to noise. It can be used as is, or be combined with a 
sensor-based algorithm to achieve a tradeoff between cover time and coverage 
guarantee. 

3.1 Analysis of the Cover Time by PC 

There is a wealth of results in the literaure for cover times by random walk 
on graphs, a sample of which was mentioned in the introduction. Our case is 
different, however, since the robot can occupy any point in the continuum of 
the region, rather than being bounded to a finite set of such points. One may 
wish to partition the region into squares, and then consider a random walk on 
a graph with the set of squares as its vertex set; but this will not do because 
the transition probabilities are not constant; rather, they depend on the precise 
location of the robot within a square (i.e. the process is not time-homogeneous). 
Hence we shall use continuous arguments to analyse the process. 

We first observe that the PC process is a strong Markov process, since the 
probability of visiting a location depends only on the previous location but not 
on the earlier history - the robot has no memory. It was proved in [24] that under 
such a process, if Q = { 51 , q- 2 , ■ ■ ■ , Qn} is a collection of subsets of a set R, then 
E[T(qi , q2, ■ ■ ■ , qn)], the expected time for visiting some point of every subset in 
Q (starting from anywhere in R) is bounded as follows: 

n 

hmax ^ E [T(^l, ^2) ■ • ■ ) ?n)] ^ ^max ^ ^ t/b (f) 

i — 1 

where hmax = maXj;£(;j{\Q) i<j<„ {hi(x)} , and hi(x) is the expected time to first 
reach subset qi upon starting from x R. Let us denote the set of unit-squares in 
if by S' = {si, S 2 , • • • , Sn}- This partition is not known to the robot, but will serve 
us in our analysis. In order to establish bounds on the average cover time of the 
PC process, we further observe that (since the robot’s covering radius is r = \/2) 
if the robot has visited all the n squares in if, then if is totally covered. See Figure 
3 for an example. Clearly, if a robot is located anywhere within such a square, the 
whole square is covered (actually, some of the neighbor squares are also partially 
covered, but this does not make any harm to our upper bound result). Thus, 
visiting all the small squares is sufficient to guarantee a full coverage of if. On the 
other hand, in order to cover if starting from any point in it, the robot should 
make, at least once, the tour between the two farthest squares in if. Let us define 
the hitting time (also known as access time or first-passage time) from a point 
a; G if to a square Sj, denoted hj(x), as the expected time of a PC process that 
starts at x and ends upon first reaching a point in square Sj . We also define Cij , 
the commute time between squares Sj and Sj as the average time of a round trip 
from Si to Sj and back, i.e. Cjj- = Cjj = Ymjix&si,yqsj {^i(®) + ^i(y)} ■ 
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Fig. 3. A grid polygon R, partitioned into unit squares, and a possible sequence of 
PC steps which take random continuous locations z\,Z 2 ,zz, thus covering the dashed 
circles. In this case, n{z 2 ) > and hence the step size at time t = 2 is greater 

than at time t = 3. The dashed circles designate the covered area. Note that, since the 
covering radius is always 2^!^ while the grid size is 1, it is sufficient to visit all squares 
in order to guarantee a coverage of R. 



implied by Equation 7 (using < 21ogn) and the above observations 

that the expected cover time of R can be bonnded: 

max {Ci j} < E [t^^] < 2(logn) max {C, ,} . (8) 

S i ,S j ^ R L J S i ,S j ^ R 

In order to find the maximum commute time (Cij) in R, we now show that the 
commute time between any squares S{, Sj in R is proportional to the product of 
the number of squares in R and the electrical resistance between Sj and Sj . The 
following Lemma is a continuous analog to [11] which related the commute time 
of a random walk on a graph with its electrical resistance, considering each edge 
as a 1-Ohm resistor. 

Lemma 3. Cij, the commute time between squares Si and Sj in R, obeys Equa- 
tion Cij — 2npi j, where n is the size of R and pij is the electrical resistance 
between squares s* and Sj, assumming R to be made of a uniform material with 
unit sheet resistance 

Proof: Let us denote the maximum step size by r. In a step, the PC robot selects 
a random angle and goes in that direction. The length of the step is p(z), half 
the maximum radius not exceeding r of a circle around z within R. As explained 

® The sheet resistance of a material is defined as the voltage across a square of the 
material caused by one unit of current (i.e. one Ampere) that is flowing between two 
parallel edges of the square. The sheet resistance is commonly expressed in units of 
Ohms per square. 
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before, we assume the time spent at z to be (p(z)/r)^ which is one unit in an 
internal point of R (i.e. where fi(z) = r), and less near the boundary, where 
f^(z) < r and steps are shorter (see Figure 3). If 2 ^ sj , then the expected time 
to reach square Sj from 2 is just the average of the step length plus the access 
time over a p( 2 )-circle around 2 , i.e. 

hj{z) = if^(z)/rf + ^ hj(z + ^i{z)e^^)dd, (9) 

where 2 + n(z)e^^ refers to a point at distance fi(z) from 2 and angle B to the x 
axis, in the complex notation. Clearly if 2 G sj then hj(z) = 0. 

Now consider i? as a flat surface of a uniformly resistive material with unit 
sheet resistance, and assume that a current of Iq = 4/r^ Amperes per unit of 
area is uniformly injected into R, and Anjr^ Amperes are rejected from R via 
the square Sj. Let us also denote the electric potential at point 2 relative to 
square Sj by <i>jiz). Since there are no current sources within R, we know from 
the Divergence Theorem (see, e.g. [19]) that for any closed surface, the amount 
of current entering the surface should equal the current exitting through it (i.e. 
the total current through the surface should vanish). Due to symmetry and 
uniformity of the resistance, the average potential around a circle of radius p 
can be calculated: 



Propositiond. The average poienital difference between the center and the cir- 
cumference of a circle of radius pt on a uniform surface with unit sheet resistance, 
into which Iq Amperes of current are uniformly injected per unit area, is 

WTW)= ^ (10) 



The proof of Proposition 4 is deferred to the Appendix. Choosing Iq = 4/r^ 
one gets — <f{ 0 ) = (p/r)^ and hence (writing p for p(z) and <f>j(z) for the 
potential at 2 when the potential in square Sj is kept at zero): 



or 



<fj(z) = (p/r)2 + 



/•27T 

/ 

J8=0 



z + pe^^jdB. 



(12) 



rj V"/ vr~/ • / ■ 2 ^ 

From the equivalence of Equations 9 and 12, and the uniqueness ® of the ex- 



® The function hj(z) is uniquely determined by 

00 

hj(z) — y ]t ■ Prob{sqnare Sj is reachable from 2 in t steps} 



^ 1‘ZTT l*27T /»Z7T 

J 6i=o J 62=0 Jet=' 



A{ 6 \, 62 , ■ ■ ■ ,Bt) dd\dd2 ■ ■ ■ dOt, 



where A{ 0 i , 62 , ■ ■ ■ ,Bt) = 1 if the sequence of angles 61 , 62 , ■■■ , 6 t leads from point 2 
to (some point of) square Sj, and 0 otherwise. 
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pectation function hj(z), we see that hj(z) is equal to the potential difference 
~ if 4r“^ units of current are injected into each unit of area, and 

4nr“^ units of current are rejected from sj . In a similar way one can show that 
hi(z) = (pi^z) — <f)j (si) , if 4/r^ units of current are injected into each unit of area, 
and Anjr^ units of current are rejected from s;. Now if we reverese the direction 
of all currents in the second case, we get that hi(z) = 4>j{si) — 4>i{z), if 4/r^ 
units of current are rejected from each unit of area across R, and Arijr^ units 
of current are injected into Si. Due to linearity of resistive electrical systems, we 
can superpose both sheets together, thus making all currents cancel each other, 
except the 4n/r^ Amperes injected at Sj and rejected from Sj. This, together 
with Ohm’s law implies that Cij, the commute time between squares s* and 
Sj obeys 



Cij= max {hj{x) + hiiy)} 

x^Si,y^Sj 

= max {(f)j{x) - (pi(y)} 

x^Si,y^Sj 



An 



(13) 



where pij is the maximal electrical resistance between squares Si and Sj in R. 
This resistance is measured by injecting a 1-Ampere current into one square, say 
Si, while rejecting it from Sj. Then the maximum potential diiference between a 
point in Sj to one in Sj is equal to pi j. 

Substituting r = in Equation 13 yields the Lemma. □ 

We now combine the above results to obtain 

Theorems. 2np < E < 2uplogn, where n is the size of R and p - Us 

resistance. 



Proof: immediate, by substituting Lemma 3 in Equation 8. 
A corollary is implied for a square room; 

Corollary 6. If R is a square a x a room, then 



□ 



(14) 



where C\,C 2 are small eonstants. 

Proof (sketch): We use the fact that the resistance of a square is 0(loga) 
Then we also note that for an a x a room, n = , which, substituted into 

Theorem 5, implies the corollary. □ 

^ Ohm’s law says that the voltage drop between two points is equal to the product of 
the current flowing between the points and the point to point resistance. 

® It is of interest to mention a lumped circuit analogy: a square mxm mesh of 1-Ohm 
resistors is known [11] to have resistance 0(logn). 
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3.2 An upper bound on the variance of 



In order for our results to be useful, we now show that the variance of the cover 
time, denoted V 



I , is also bounded from above and hence there is only a 
limited spread of the covering time around its average. It has heen proved in 
[3] that the variance in the cover time of a set S is at most constant times the 
expected time of covering the last item in the set, i.e. 



V T 



ncover of S 



< cq • E T 



cover of the last item in S 



(15) 



where Cg is a constant ® less than Applying it to our case, we can use the 
maximum access time as an upper hound to the cover time of the last item (i.e. 
a yet-unvisited square), so we get; 



nPC 



< max {Cij}<2^^np, 

St,Sj^R 



(16) 



which implies that the standard deviation is at most 32-y/2np. 



4 A Universal Traversal Sequence of Angles 

Let us define a universal traversal sequence of angles (UTSA) for a family of 
planar sets T as a. finite sequence of real numbers a = (oi, 02 , ■■■ , om), all in 
[0, 27 t), such that if a PC robot takes the turn at in step t, it is guaranteed to 
cover any shape from iC, independent of the starting point. In this section we 
shall show that if iF is the set of all n-size unit-grid polygons, (i.e. polygons 
made of n attached 1x1 squares), then such a sequence exists and has a length 
polynomial in n. For this purpose we follow the probabilistic method invented 
by Erdos and used in [2] to prove that a sequence of length O(n^logn) exists 
that covers any edge-labelled Uregular graph with n vertices. 

Theorem 7. There exists a sequence o/2n^logn angles that guarantees covering 
of any rectilinear gridded polygon of size n. 

Proof: First let us observe that if T is the set of all n-size unit-grid polygons, 
then |iF| < 2" (since all polygons of size n can be enclosed by an n x n square) . 
We next apply Theorem 5 to obtain an upper bound of t = 2n^ log n on the 
expected cover time of any polygon in T, using the fact that the resistance p 
obeys p < n for such polygons. Hence, after a sequence of t random turns, the 

® This value of the constant does not appear in [3], but can be calculated based on 
the analysis done there. 

a graph is k-regularif exactly k edges emanate from every vertex. It is edge-labelled 
if the edges emanating from each vertex are numbered in some order. 
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probability of complete coverage is at least 1/2, and after an mt-long sequence 
it is at least 1 — 2“'". On the other hand, 

Prob {3i? G T s.t. R is not covered by a random mt-long sequence } 

< ^ Prob { i? is not covered by a random mt-long sequence } 

< 2-™|,f| < 2""-™. (17) 

Hence if we choose m > then the probability for existence of an tuf-long 
sequence that does not cover all polygons in tF is less than one, i.e. there exists 
such a sequence which does guarantee covering of all polygons in R , and hence 
there is a (2n^ log n)-long sequence of angles which is a UTSA for R . □ 

Note that finding a universal sequence of length 0(4”) is easy - just traverse 
the quaternary tree of height n with the starting point as the root and with four 
sons to each vertex, each representing a turning angle from {0, 7 t/2, tt, 37t/ 2}. 
Backtracking is possible thanks to the “compass” that our robot has. Clearly, 
not all steps will be of length r because of walls and obstacles, but eventually 
all squares will be reached. 



5 Summary 

We have shown that the expected cover time by a process of random steps in 
a continuous polygon is related to the electrical resistance of the polygon. The 
setting of continuous space is more relevant to robotics than the discrete struc- 
ture of graphs, since robots move continuously, and even if a discrete partition is 
dictated by some external signs (e.g. a tiled fioor), it is still hard for a low-cost 
robot to precisely identify those signs. The problem of continuous covering has 
various implications for both theory and practice. The analysis suggested in this 
paper can serve as an inspiration for further research in several directions, some 
of which are described below. 

1. Cooperating PC robots In a multi-robot setting we just add robots and 
let them all follow the same PC rule. It is intriguing to see what if a more 
significant communication is enabled, e.g. if a collision with another robot 
or with the wall makes the future steps biased against the (alleged) location 
of other robots/ walls. 

2. Finding a “short” universal traversal sequence of angles: We have 
shown the existence of a polynomial-length universal sequence of angles 
(UTSA) for gridded polygons. However we do not know how to find one. 
The similar question for graphs is also wide open, with the only exceptions 
(known to us) being paths and cycles [9], [8]. Intuitively, one may think 
that finding a UTSA in our case is easier, since the robot is assumed to 
have a kind of “compass” , while in the UTS problem for graphs, edges are 
arbitrarily ordered. 
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Appendix: Potential Difference Across A Uniformly-Resistive 
Circle 

Proof of Proposition 4 : 

Consider a circle of radius ji and unit sheet-resistance, and assume that a current 
of Iq Amperes per unit area is uniformly injected into the circle. We seek for the 
average potential difference (or ’’voltage drop”) between the center of the circle 
and its circumference, defined by 

fli(O) - fli(/i) = — / (^(0) - 4>{Re^^))de. (18) 

Consider a ring of radius u and infinitesimal width du (see Figure 4). 

We know (from the Theorem of Divergence) that, since there are no sources 
or sinks of current on the surface, all the current injected into the w-circle should 
flow out across its boundary and into the ring. This amount of current is Iqtcv? . 
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Fig. 4. An infinitesimal ring within a circle. The average voltage drop across the ring 
is obtained by integrating over small trapezoids like the one in gray, through which the 
centrifugal current I{u,6) is flowing. 



Let us denote by I{u, 6) the centrifugal current flowing at ue*® in direction 9, by 
d(j)(u, 9) the voltage drop between the inner and outer edges of an inflnitesimal 
part of the ring, and by d(j>(u) the average voltage drop across the ring. One can 
now write 



1 

d(f>(u) = — I d<f)(u, 9)d9 
27t Js=o 



1 I(u, 9)dud9 
27 t ud9 



(the resistance of a rectangle is length/width) 



du 
2ttu 



I(u, 9)d9 = -^^TTu 



27TU 



2 T 

-to — 



Ifjudu 



(19) 



Note that the voltage drop across the ring dne to the cnrrent flowing into the 
ring itself is proportional to the prodnct of this current (o((u -|- duY — v^) = 
o{udu)) and the ring’s resistance ((o(dw/w)), hence is o((rfu)^), and vanishes in 
integration. Thus the total voltage difference can be fonnd by integrating along 
u: 



(?i(0) - = / d(p(u)du = j ^ 

Ju-=.0 Ju-=0 ^ 



du = 



Io^i^ 



(20) 

□ 
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Abstract. We are interested in the fringe analysis of synchronized par- 
allel insertion algorithms on 2-3 trees, namely the algorithm of W. Paul, 
U. Vishkin and II. Wagener (PVW). This algorithm inserts k keys into 
a tree of size n with parallel time 0(log n -f log k). 

Fringe analysis studies the distribution of the bottom subtrees and it is 
still an open problem for parallel algorithms on search trees. To tackle 
this problem we introduce a new kind of algorithms whose two extreme 
cases seems to upper and lower bounds the performance of the PVW 
algorithm. 

We extend the fringe analysis to parallel algorithms and we get a rich 
mathematical structure giving new interpretations even in the sequential 
case. The process of insertions is modeled by a Markov chain and the 
coefficients of the transition matrix are related with the expected local 
behavior of our algorithm. Finally, we show that this matrix has a power 
expansion over (n-|-l)“^ where the coefficients are the binomial transform 
of the expected local behavior. This expansion shows that the parallel 
case can be approximated by iterating the sequential case. 

Keywords: Fringe analysis, Parallel algorithms, 2-3 trees. Binomial trans- 
form. 



1 Introduction 

Fringe analysis studies the distribution of the bottom subtrees or fringe of trees 
and has been applied to most search trees in the sequential case [EZG'*'82,BY95] 
We are interested on the fringe analysis of the synchronized parallel algo- 
rithms on 2-3 trees designed by W. Paul, U. Vishkin and H. Wagener [PVW83]. 
This algorithm inserts k keys randomly selected with k processors in time O(log n-|- 
logAr) into a 2-3 tree of size n. The fringe analysis in this case is still open and 
the main drawback is the reconstructing phase that is composed by waves of 
synchronized processors which modifies the tree bottom-up. 

In this paper we propose a new synchronized parallel algorithm, denoted 
MacroSplit, that bounds the [PVW83] one in the following sense: the distribution 
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1 -type leaves 2-typc leaves 2-type leaves 1-type leaves 

Fig. 1. The transformation of x and y bottom nodes after insertion of one key. In (1) 
the key b hits a bottom node x (containing the key a). Node x transforms into a node 
y (having keys a and b). We have Xt+i = Xt — 2 and It-ti = Ft + 3. In case (2) the 
key c hits a bottom node y containing a and b. This the node y splits into 2 nodes x 
containing a and c respectively, while b is inserted in the parent node recursively. Now 
Xt-^-i = Xt -f 4 and Ff^-i = Ft ~ 3. 



of the fringe derived from the [PVW83] algorithm is upper and lower bounded 
by the distribution derived from two extreme cases of our algorithm. The key 
idea is that our algorithm reconstructs the tree with only one wave meanwhile 
[PVW83] needs a pipeline of waves. 

We have extended the fringe analysis from the sequential case into the parallel 
case with significant improvements . As later on is showed, the direct extensions 
of this technique on two concrete cases (the parallel insertion of two and three 
keys) suggest the inapplicability of this technique on cases greater than these 
simple ones. We have overcome this drawback with two facts allowing us the 
analysis of the generic case (the insertion of k keys) : 

— The random insertion of keys generates a binomial distribution on the bot- 
tom nodes. This fact allows us the probabilistic analysis of the parallel algo- 
rithm. 

— The fringe evolution is determined by the expected local behavior of the 
algorithm. This fact gives a new understanding to fringe analysis. 

The rest of the paper is organized as follows. In section 2 we recall the 
fringe analysis of the sequential case. In section 3 we introduce the MacroSplit 
algorithms. Section 4 contains the direct extension of the fringe analysis for the 
parallel introduction of two and three keys. Section 5 contains the analysis of 
the generic case and section 6 the analysis of two concrete cases of this generic 
case. Finally section 7 contains the conclusions. 

2 Sequential case 

The fringe of a tree is composed by the subtrees on the bottom part of a tree. 
Our fringe is composed by trees of height one. A bottom node with one key is 
called and x node, and a bottom node with two keys is called an y node. These 
nodes separate leaves into 1-type leaves if their parents are x nodes and 2-type 
leaves if their parents are y nodes. 
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Fig. 2. Choices for MacroSplit rules. In (i) the MaxMacroSplit rule creates a maxi- 
mum number of splits. In (u) the MinMacroSplit rule creates the minimum number. 
Intermediate strategies are allowed. 




Let Xt and Yt be the random variables associated to the number of 1-type 
leaves and 2-type leaves respectively at the step t. We assume that Xt+Yt — n-|-l 
being n the number of keys of the tree. When a new key falls into a bottom node 
this node is transformed according the rules given in figure 1. The probability 
that a key hits a bottom node x is and for a node y is The conditional 
expectations verify 



E{Xt+, I Xr, Tt, 1) = -^{Xt - 2) + -^{Xt + 4) 



n 4- 1 



rt -P 1 



1 - 



n -P 1 



Xt + 



n -P 1 






E{Yt+^ I Xt,Yt, 1) = + 3) + -^{Yt - 3) 

n + i n-\- V 



-^—Xt +(l — ) Yt 

n + 1 V n -p 1 



The expected number of leaves (conditioned to the random insertion of one key) 
at the step t can be modeled by [EZG"*"82,BY95]: 



(E(Xt+i (E{Xt I 1)\ 

{EiYt+,\l)J |l)j 



As the conditional expectations verify E{Xt+i \ 1) = E{E{Xt+i \ Xt,Yt,l) \ 1) 
and E(Yf+i \ 1) = E{E{Yt+i \ Xt,Yt,l) \ 1) we get from the preceding expression 
the 1-OneStep transition matrix 



r„.i = ( 1 



1 



n -p 1 



/ + 



1 



-3 4 



n -P 1 V 3 —4 



being 






1 0 
0 1 



In order to compare with the parallel case we consider the sequential insertion 
of k keys given by -Tn+k-i, 1 • • - Tn^i. It is easy to prove 



^5'= II 



k 



n-\- 1 



/ + 



k /-3 4 



n -p 1 



+ 0 ( - 1 I 
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k 


X node 


y node 


1 


y 


XX 


2 


XX 


xy 


3 


xy 


XXX or yy 


4 


XXX or yy 


xxy 


5 


xxy 


xxxx or xyy 


6 


\xxxx or xyy xxxy or yyy 



Table 1. MacroSplit possibilities for x and y bottom nodes once k keys are inserted. 



3 MacroSplit parallel insertion algorithms 

We introduce a parallel insertion algorithm based on the idea of MacroSplit. On 
this algorithm an array of ordered keys a[l . . ] is inserted into a 2-3 tree having 
n leaves. The MacroSplit insertions algorithm has two main snccessive phases. 

Percolation Phase. In a top-down strategy, the set of keys to be inserted is 
split into several packets and these packets are routed down. Finally, these 
packets are attached to the leaves [PVW83]. 

Reconstruction Phase. In a bottom-up phase the packets attached to the 
leaves are really inserted and the tree is reconstructed. This reconstruction 
is based in just one unique wave moving bottom up. First, the packets are 
incorporated at the bottom internal nodes of the tree. In successive steps the 
wave moves up, decreasing the depth one unit at each time. The evolution 
of this unique wave needs the usage of rules so called MacroSplit rules (see 
Figure 2). 

The MacroSplit algorithm can be seen as a “height level” description of 
the parallel insertion algorithm given by W. Paul, U. Vishkin and H. Wagener 
in [PVW83] which take place by splitting a MacroSplit step into several more 
basic steps chained together in a pipeline. 

Let us see why we have several MacroSplit algorithms for a large k. At most, 
k keys can reach a node. If the node stores more than two keys, it must split 
using a MacroSplit rule. Table 1 show us several split possibilities for x and y 
bottom nodes. For instance, the first row show us the splits of the x and y nodes 
when k = l(see Figure 1). In this case there is just one possibility. The fourth 
row show us how x and y nodes can be split when A; = 4. In this case a bottom 
node X can be split into 3 nodes x or into 2 nodes y. Later on we will consider 
two extreme cases. The MaxMacroSplit algorithm will maximize the number of 
splits at each step and the MinMacroSplit algorithm will minimize this number. 
When A; = 1 or 2 both algorithms coincides (see table 1). 

Consider that at the A + 1 step k random keys (we asume a uniform distribu- 
tion of them) fall in parallel into a fringe with Xt leaves of 1-type and Yt leaves 
2- type such that Xt +Yt — n 1. The expected values of Xt+i and It+i after 
the insertions depends on two facts. 
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(•,•) 


P{ 




T(W+i|W,Y,, 2 , (.,.)) 


T(Y+i|W,Y, 2, (.,.)) 


(x,x) 


JLl. 
n+ 1 


2 

n+ 1 


Xt -f 2 


Yt 


(X1,X2) 


n+ 1 


Xf-2 
n+ 1 


Xt-4 


Yt + 6 


(x,y) 


to 


1 n+1 


Xt + 2 


Yt 


{y, y) 


n+ 1 


3 

n+ 1 


Xt + 2 


Yt 


(yi,y2) 


n+ 1 


y<-3 

n+ 1 


Xt + 8 


Y-6 



Table 2. Parallel insertion of two keys 



— The concrete form of the MacroSplit algorithm. This algorithm explicites how 
many leaves of 1-type and 2-type will be generated by bottom nodes when 
they receive some number of keys. 

— The preceding values of Xt and Yt . 

We deal with a Markov chain and the evolution can be analyzed through the so 
called k-OneStep transition matrix T„ /s 

(E(Xt+, \k)\_ fE{Xt I k)\ 

{EiY+,\k)J--^’^’^ {E{Y\k)J 

4 Parallel insertion of 2 and 3 keys 

In this section we compute T „^2 and T„ 3 following directly the technique applied 
before to sequential insertions [EZG'*'82] and we discuss the viability of this 
approach. 



4.1 Direct extensions 

First, let us consider the case k — 2. We have only one MacroSplit algorithm 
(see Table 1). The expected number of leaves is characterized by 2-OneStep T „_2 
transition matrix: 

fE{Xt+i \2)\_ [E{Xt I 2)A 
{E(Yt+^\2)J -^’^^^{E(Yt\2)J- 

We compute the probabilities of the different splits by an exhaustive case analysis 
(see Table 2). As at most two keys can reach the same bottom node, we have no 
election in the split, z.e. the transformation of bottom nodes is unique (second 
row of table 1). Both keys can be either at the same bottom node or at different 
bottom nodes, and in each case bottom nodes can be of type x or y. Let P{x, x) be 
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the probability that both keys reach the same x node, P{x\, X 2 ) the probability 
to reach different x nodes and so on for the remainder probabilities P{x, y) and 
P{yi: y^)- We denote the generic case as P[-, •), being (., .) the generic pair of 
nodes accessed. 

As E{Xt+i I 2) = E{E{Xt+i I Xt,Yt,2)) we compute the expected number 
of 1-type leaves as E{Xt+i\Xt,Yt,2) = ') E{Xt+i\Xt,Yt,2, (., .)) being 

E{Xt+i\Xt,Yt,2,{., .)) the expected number of 1-type leaves when 2 keys reach 
node (•, •) conditioned to Xt and Yt. For instance, if both keys reach different x 
nodes then it holds 



P{XI,X2) 



Xt Xt-2 
n + 1 n + 1 



and E{Xt+i\Xt,Yt,2,{x\,x2)) = At — 4 (table 2 contains the other values). In 
the appendix we give the proof of the following lemma. 



Lemma 1. The conditional expectations verify 






As the conditional expectations are linear in At and It and £'(At+i | 2) = 
E{EiXt+i I At, Yt,2)), E(Yt+i \ 2) = EiE(Yt+i \ At, Yt, 2)) we have: 



Lemma 2. The 2-OneStep transition matrix is: 



Tn,2 — 




1 + 



2 

n + 1 




1 

^ {n + 1)2 



f 12 -18A 
\-12 18 J 



Consider briefly the case = 3. Now there are two possibilities (third row of 
table 1). We have selected the second transformation. This corresponds to the 
MinMacroSplit algorithm. As before, an exhaustive case analysis give us: 



Lemma 3. In the case of the MinMacroSplit algorithm, the 3-OneStep transition 
matrix Tn ^3 is: 




/ + 



3 

n-\-l 



-3 4\ 3 

3 -Aj^ {n + iy 



( 12 -18A 
[-12 18 J 



1 

^ (n+l)3 



/-48 54 \ 
48 -5AJ 



4.2 Discussion of the cases 2 and 3 

Based on the preceding cases can point several facts and questions: 

1. The exhaustive case analysis (generalizing the sequential approach [EZG'''82]) 
for larger k, k = 5,6, . . . , becomes intractable. 
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2. For k = 1,2,3 the expectations E[Xt-^-i | Xt,Yt,k) and E{Xt+i \ Xt,Yt,k) 
are linear in Xt and Yf. It is unclear why non-linear terms always disappears. 
Note that we assume this point of view in the equation k-OneStep transition 
matrix Tn^k- 

3. The intuitive meaning of the coefficients appearing in the expectations is 

unclear. For instance, the term 1 — -F appearing in E{Xt+i \ 

Xt,Yt,2) in lemma 1 does not have any direct explanation in terms of the 
MacroSplit algorithm. 

4. By local behavior oi the algorithm we mean what happens when i keys hit just 
one bottom node x or y (table 1) . By global behavior we mean the evolution 
of Xt and Yt. The previous exhaustive analysis does not give a clear cut 
between the local and the global behavior of the MacroSplit algorithm. 

5. Note that 

— Lemmas 2 and 3 can be envisaged as a power expansion over (n+ 1)“^ 
of the transition matrix. 

— The matrices appearing when k = 2 also appears for Ar = 3 (see lemmas 2 
and 3). 

This suggest us a power expansion of the k-OneStep of the form 



Tn,k — 




1 + 



lijk) 
n-\- 1 




I 72 (fe) ( 12 -ISA 73 (k) 
(n + l)2 V-12 18 (n+l)3 



/-48 54 \ 
48 -54j 



+ ■■■ 



Moreover, a little bit of though suggest us 7 * (A;) = (^) • ' ' 

6. The different coefficients appearing into the matrices reflect the behavior of 
the MacroSplit algorithm. We search for a precise meaning of this intuitive 
fact. 



In the following we solve all these questions. 



5 Behavior of the MacroSplit algorithms 

In order to study the expected behavior of an a; or y node belonging to a fringe 
of n 4- 1 leaves when k keys are inserted at a given step, we need to know the 
characteristics of the MacroSplit algorithm we are using. 



5.1 Local behavior 

We would like to know how many 1-type and 2-type leaves are generated when 
i keys fall in the same step into a unique node x or y. To deal with this fact we 
introduce the following definition. 

Definition 4. At the bottom level, the local behavior of the MacroSplit algo- 
rithm is given by the following functions: 
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— The Xx{i) is the number of 1-type leaves after the insertion of i keys into a 
unique x node (for instance, Xx{0) = 2, Xx{^) = 0, . . . ). In the same way, 
Xy(i) is the number of 1-type leaves after the insertion of i keys into an y 
node(for instance, Xy{0) = 0, Xy(l) = 4, . . . ). 

— Dually, yx{i) is the number of 2- type leaves after the insertion of i keys into 
an X node. Finally, Ty(*) is the number of 2-type leaves after the insertion 
of i keys into an y node. 

These coefficients verify Xx(i) -b = 2 + i and Xy{i) -\- = 3 -|- f. 



5.2 Distribution function 



Assume that random k keys fall (in parallel) into a fringe having n 1 leaves. 
First of all, let us isolate just one bottom node x and one key to insert. Fixed x, 
it has two leaves, and one new key can be inserted into this node in two different 
positions (corresponding to the left of each leaf). Therefore just one key hits a 
node X with probability . 

Now we consider what happens with this node x when k keys are inserted. 
Let Nx the a random variable denoting the number of keys falling into a fixed 
bottom node x. As the set of k keys is random, this variable follows the binomial 
distribution, /,\ « ■ o i. ■ 

m = .>=())( Ay (l-;^)“- = K‘.‘. 



n + \ 



n + \ 



such that b[i,k,p) = (1 — p)^ * • Recall that the expected value is kp. 



5.3 Expected local behavior 

The number of 1-type leaves generated by the keys falling into a unique node x 
is given by the random variable = Xx{Nx). The expected number of 1-type 
leaves generated by one x bottom node when a batch of k keys is inserted into 
a fringe having n-\-l leaves is: 

/b /c 2 

E{Xx I k) =Y,P{Nx = i]Xx{i) = k, -^)Xx(i) 

The number of 2- type leaves generated by the node x \sYx — A (A), then the 
expected value is is 

k Q 

E{Yx\k) = Y,h(i,k,—-)yx{i) 

■ r\ ' 71 -\- i. y 

2 = 0 

Let us fix a bottom node y. In this case, one key hits this node with probabil- 
ity • Let Ny be another random variable denoting the number of keys falling 

into this node y, clearly P{Ny — i} — In this case we have the 

random variables Xy — Xy{Ny) and Yy — YyiNy) having the expected values 

^ Q k n 

E{Xy \k) ='^h{i,k,—^Xy{{) E{Yy \ k) = (i , k , —^Yyii) 
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Note that these expected values depend of the concrete local behavior of the 
algorithm. The expected number of leaves generated by just one bottom node 
when k random keys are inserted in parallel into a fringe having n + 1 is: 

E(X,+Y,\k) = 2(1 + ^) and E(Xy + Yy \k) = ?.{l + ^) 



5.4 Global behavior 

We relate the local behavior with the global one by means of the matrix transi- 
tion: 



Definition 5. Given a fringe with n + 1 leaves and a MacroSplit algorithm, we 
define the k-OneStep transition matrix as: 

_ flE{X,\k)mXy\k) 

- ^E(y, \ k) lE(Yy\k) 

The proof of the following lemma is given in the appendix. 

Lemma 6. Given a fringe with Xt leaves of 1-type andYt leaves of 2-type, when 
k random keys are inserted into it in one step we have 

f E{Xt+i\Xt,Yt,k)\ _ fxA 
\E{Yt+,\Xt,Yt,k) J 

The proof of the following theorem is given in the appendix. 

Theorem 7. When k random keys are inserted in one step we have: 

fE{Xt+i\k)\_ fE{Xt\k)\ 

{EiY+,\k)J--'^’^{E{Y\k)J 

From the note 5 of the section 4.2 in which we have conjectured a power 
expansion form for the transition matrix, it will be interesting to have a k- 
OneStep transition matrix (definition 5) like T„ -|- I . We can 

prove: 



Lemma 8. Let I be the two dimensional identity matrix, the k-OneStep verifies: 



Tn,k — 



A 



n -F 1 



/ + 



fAE{YAk) 

\ lEA^lk) 



mXy\k)\ 

-lEiXy\k)J 



5.5 Power expansion on the transition matrix 

Let us recall the binomial transform B recently developed by P. Poblete, J. Munro 
and Th. Papadakis [PMP95]. Let (Fi)j>o be a sequence of real nnmbers, the 
binomial transform is the sequence (.Fj)j>o defined as 

Fj=BjE = j2i-A(^)Ei. 

i-O ^ ' 

This transformation verifies Fi — BiFj. In the following we will use the following 
weighted form of the binomial transforms of (Tj;(*))i>o and (<Ty (i))j>o: 
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Definition 9. Let consider the coefficients oj = —2^ ^yx(j) and /3j — —3^ 

Let us develop the relationship of the preceding coefficients with the local 
expected values of the k-OneStep appearing in the lemma 8. The proof of the 
following lemma is given in the appendix. 

Lemma 10. 

£(yi m = 

From lemmas 6 and 10 we get the following expansion 
Theorem 11. The k-OneStep transition matrix can he rewritten as 




Note that Tn^k — +0 (l/n^) I. 

6 Two extreme MacroSplit algorithms 

We have shown that the k-OneStep transition matrix depends of the concrete 
MacroSplit algorithm. In this section we develop two extreme cases of this algo- 
rithm: one denoted MaxMacroSplit algorithms that makes the maximum number 
of splits and creates the maximum number of x nodes and another denoted Min- 
MacroSplit algorithm that makes the minimum number of splits and creates the 
maximum number of y nodes. These two extreme cases seems to bound the be- 
havior of the whole pipeline in the W. Paul, U. Vishkin and H. Wagener [PVW83] 
insertion algorithm. 

6.1 The MaxMacroSplit and MinMacroSplit algorithms 

Assume that an even i number of keys are attached to a node x (i — 6 in the 
case 1 of the figure 3). This wide node splits by yielding i + 2 1-type leaves (8 
in the preceding case) and 0 2-type leaves. Then X^(i) = i -\- 2 and = 0. 

On the other hand, an odd number i of keys are attached [i = 7 in case 2 of 
the figure 3). In this case the split only creates one node y, then = 3 and 

Xx{i) = f — 1 (3 and 6 respectively in the figure). Note that Tj,(*)-|-T®(*) = i+2. 
We summarize the previous paragraph into the following lemma. 

Lemma 12. The MaxMacroSplit algorithm has the following characterization: 
(1) The local behavior is given by: 
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Fig. 3. Application of MaxMacroSplit rule on a node x 



— For even i we have Xx{i) = i-\- ‘2, yx{i) = Oj ^y{i) = h = 3. 

— For odd i we have Xx{i) — i — 1, yx{i) — 3, Xy(i) — i-\- 2t, = 0, 

(2) The expected local behavior is 

E{Xx\k) ^kp+]^+‘^(q- pf E{Yx\k) ^^-^{q-pf for p = — ^ 
E{Xy\k)=kp+^-^{q-pf E{Yy\k) = ^ + ^{q-pf for p ^ 

(3) The power expansion verifies cto = /?o = 0, /?i = 4. For j > 0 we have 
aj = — 3 4-^“^ and for j > 1 we have fdj = —3 6^“^. 

When we consider a minimum number of splits we have the following 

Lemma 13. The MinMacroSplit algorithm has the following characterization: 

(1) The local behavior is given by: 

— For i mod 3 = 0 we have Xx{i) = 2, yx{i) = i, Xy{i) = 0, = i -F 3. 

— For i mod 3 = 1 we have Xx{i) = 0, yx{i) = « + 2, Xy[i) = 4, yy{i) = i — 1, 

— For i mod 3 = 2 we have Xx{i) = 4, yx{i) = * — 2, Xy[i) = 2, yy{i) = i+1. 

(2) Let be . . 

^ ^ / 2 - 3p -F pVSi \ , { 2 - ?>p p\/l\\ 

<p — }Ze\ 2 I ^ — V o Im I I 

The expected local behavior ts determined by: 

E{Xx \k) =2- E(Yx \k) =pk +^(f for p = — ^ 

E{Xy\k)=2-2<f+^<p E{Yy\k)=pk+l + 2<f-^<p for 



(3) For j > 2, the power expansion coefficients of the MinMacroSplit algorithm 
verify ctj+e = and fdj+e = 12^/% 
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6.2 Relationship with PVW’s algorithm 

Let us see how the MaxSplit and MinSplit algorithms bound the fringe behavior 
of the insertion algorithm given in [PVW83]. On it, a macro step contains the 
whole insertion of the k keys. Let the fringe distribution before 

the pipeline starts and let be be the fringe once the pipeline has 

finished. A rough bound is given in the following conjecture. 

Conjecture 14. Let y-^MaxSpZ/t ^ fringe in the MaxMacroSplit al- 
gorithm. Let i)g ^/jg fringe in the MinMacroSplit algorithm and 

Let y/’^^ be the fringe in the macro step algorithm of W. Paul, U. 

Vishkin and H. Wagener, we have: 

^^^MinSplit I ^ I < ^^^^MaxSplit | 

^^yMaxSp/;t I ^ I < ^^Y^'^inSplit I 

7 Conclusion 

We have analyzed the MacroSplit parallel insertion algorithms (Theorem 7) and 
we have proved that the coefficients of the k-OneStep, determining the global 
behavior of the algorithm, are given by the expected local behavior. We have de- 
veloped the power expansion (theorem 11) proving that the MacroSplit algorithm 
can be approximated by the iterative sequential algorithm with an error deter- 
mined by 0(l/n^) (being n the size of the tree). The coefficients of the expansion 
are proportional to the binomial transform of the expected local behavior 

We have conjectured (conjecture 14) that the [PVW83] algorithm is bounded 
by the two extreme algorithms MaxMacroSplit and MinMacroSplit and we have 
computed (lemmas 12 and 13) the main values of these algorithms. In the limiting 
case (very large trees) all these algorithms have the same performance. 

Acknowledgement: We thank M. Sanchez Busquets to point out an error in 
an earlier version of the paper. 
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A Appendix 

A.l Proof of lemma 1 

The conditional expectation E(Xt+i\Xt,Yt,2) is: 

E{Xt+i\Xt,Yt,2,i. ,.)) 

Oc) 

+ 2) + Xt{Xt - 2) (At - 4) + 2XtYt{Xt + 2) 

+ 3Yt (At + 2) + At (At - 3) (At + 8)) 

=At + (l2At - 4Xf + 4At At + SAt^ - 18At) 

=At + ^ (l2At - 4At(At + At) + 8At(At + At) - ISA*) 

The Tf+i term has a similar development. 

A. 2 Proof of lemma 6 

Let us consider a fringe having At leaves of 1-type and Yt leaves of 2-type and 
At + It = + 1- In this fringe, the nnmber of x bottom nodes is At/2. The 

number of y bottom nodes is At/3. Now we insert k random keys in just one step 
and we are interested in the value of At+i. Let Afx,i be the number of x nodes 
getting i keys and let Myj be the number of y nodes getting i keys. We have 

k k 

At+i = + 

2=0 2=0 

Recall that At and Yt are fixed. As an x node gets i keys with probability 
P{Nx = *} and the number of x nodes is ^At, the random variable N x,% follows 
the binomial distribution 

P{Mx,i = / I At, Yt,k} = h(j, ^Xt,P{Nx = d) 

Then the expected number of x nodes receiving exactly i keys each one is 
E{Nx,i I Xt,Yt,k) = P{Nx = i I k}^. Similarly, the expected number of y 
nodes is E{My^i \ At, At, k) = P{Ny = i \ k}^. 

We study the expected behavior of At+i when k keys are inserted at random. 

k k 

E{Xt+i I At, At,*) =Y,E{Nx,i I Xt,Yt,k)Xx{i)+Y.^^^y^i I Xt,Yt,k)Xy{i) 
2=0 2=0 

k k 

= ^ = * I + f = *■ I k}Xy{i) 

2=0 2=0 

^EiXx\k)^ + E{Xy\k)^ 
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The computation for E{Yt-\-i | Xt,Yt, k) is similar. 

A. 3 Proof of theorem 7 

From the preceding lemma we have 

E{Xt+^ \Xt,Yt,h) = E{X, I + E{Xy I k)^ 

As E(Xt + 1 I A;) = E{E{Xt^i \ Xt,Yt, k) \ k) we have 

E{Xt + 1 I fe) = ^E{E{X^ I k)Xt I k) + ^E{E{Xy \ k)Yt \ k) 

As Xx and Xt are independent E{E(Xx \ k)Xt \ k) = E(Xx \ k)E{Xt \ k) and 
the proof is done. 



A. 4 Proof of lemma 10 



Recall that 



k 

E{Yx\1)=Y,I'{Xx 



1=0 ^ ^ 



2 

n + 1 




Consider the sequence {E{Yx \ ^))r>o, given p -\- q = \ the binomial transform 
verifies Bj J2i — P’ Fj therefore: 

E(Yx I j) = BjE{Yx I £) = (:^yBjyxi£) = 

Now we apply the property — BkBjFt and 



E{Yx I k) = BkE{Yx I i) = BkBjE{Yx \ £) = B^Yxii)) 

Using linearity Bh'/Fj = ~fBkFj and aj = — we have 

E(Yx I k) = 2B,[[:^y2^-^Bjyxii)) = 2B,[[^yaj) 



= -2^(-iy 



j=0 



k 

jj {n+ \y 



The case E(xy \ k) is quite similar. 
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Abstract. We consider the problem of extending the analysis of balls 
and bins processes where a ball is placed in the least loaded of d ran- 
domly chosen bins to cover deletions. In particular, we are interested 
in the case where the system maintains a Hxed load, and deletions are 
determined by an adversary before the process begins. We show that 
with high probability the load in any bin is O (log log n). In fact, this 
result follows from recent work by Cole et al. concerning a more difficult 
problem of routing in a butterfly network. 

The main contribution of this paper is to give a different proof of this 
bound, which follows the lines of the analysis of Azar, Broder, Karlin, 
and Upfal for the corresponding static load balancing problem. We also 
give a specialized (and hence simpler) version of the argument from the 

M. Luby, J. Rolim, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 145-158, 1998. 

© Springer-Verlag Berlin Heidelberg 1998 




146 



R. Cole et al. 



paper by Cole et al. for the balls and bins scenario. Finally, we provide 
an alternative analysis also based on the approach of Azar, Broder, Kar- 
lin, and Upfal for the special case where items are deleted according to 
their age. Although this analysis does not yield better bonnds than our 
argument for the general case, it is interesting because it utilizes a two- 
dimensional family of random variables in order to account for the age 
of the items. This technique may be of more general use. 



1 Introduction 



A standard question in load balancing is to consider what the distribution of 
balls in bins looks like when m balls are thrown into n bins. In particular, 
when n balls are thrown into n bins, it is well known that the maximum load 
is approximately log n/ log log n with high probability. The seminal paper of 
Azar, Broder, Karlin, and Upfal asked a related question: suppose the balls are 
placed sequentially, and each ball is placed in the least loaded of d bins chosen 
independently and uniformly at random [4]. In this case, they find that the 
maximum load is log log n/ log d -I- 0(1) with high probability; more detailed 
analysis of the distribution in this case is undertaken in [10]. This work has 
led to a number papers analyzing related load balancing schemes, including for 
example [1,2,5-9,11,12]. 

Note that the above result applies to a static problem, where a fixed number 
of balls are distributed. An interesting related question is to consider the dynamic 
situation where balls can be deleted as well as inserted into the system over time. 
Indeed, the original paper by Azar, Broder, Karlin, and Upfal examines the 
dynamic situation where at each step a random ball is deleted and re-inserted in 
the system [4]. Related work by, for example, Mitzenmacher [8,9, 11] and Adler 
et al. [1] examines deletions via connections with queueing theoretical models. 

Here we focus on a model where an adversary may specify a deletion sequence 
in advance. Our first and main result is to extend the proof of [4] to handle a 
polynomial length sequence of insertions and deletions, where the maximum load 
in the system is always at most n balls. We then note that an even more general 
result, in which re-insertions can occur, is already essentially contained in the 
results of [6] (a re-insertion causes a ball to choose among the same bins as on 
its first insertion) . This work considered a similar problem related to routing on 
a butterfiy network. We restate this proof for the balls and bins setting, where 
it becomes significantly simpler. Finally, we consider a special case in which 
deletions are always of the item that has been longest in the system. We again 
use a variant of the two-choice argument from [4], this time making use of a 
two-dimensional family of random variables, similar in spirit to the work of [11]. 

We emphasize that the interest of this work lies in the techniques used rather 
than the result, which is already implicit in the work of [6]. 
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2 Adversarial deletions: polynomially many steps 

In this section, we demonstrate that the original proof of Azar, Broder, Kar- 
lin, and Upfal in [4] can be extended to handle deletions under an appropriate 
adversarial model for polynomially many steps. We first define the underlying 
process. 

For a vector v = {vi,V 2 , ■ ■ ■), let Pd{v) be the following process: at time 
steps 1 through n, n balls are placed into n bins sequentially, with each ball 
going into the least loaded of d bins chosen independently and uniformly at 
random. After these balls are placed, deletions and insertions alternate, so that 
at each subsequent time step n + j, first the ball inserted at time Vj is removed, 
and then a new ball is placed into the least loaded of d bins chosen independently 
and uniformly at random. (Actually we do not require this alternation; the main 
point is that we have a bound, n, on the number of balls in the system at any 
point. The alternation merely makes notation more convenient.) 

We assume the vector v is suitably defined so that at each step an actual 
deletion occurs; that is, the vj are unique and Vj < n + j — 1. Otherwise v is 
arbitrary, although we emphasize that it is chosen before the process begins and 
does not depend on the random choices made during the process. 

We adopt some of the notation of [4]. Each ball is assigned a fixed height 
upon entry, where the height is the number of balls in the bin, including itself. 
The height of the ball placed at time t is denoted by h{t). The load of a bin 
at time t refers to the number of balls in the bin at that time. We let n>k{t) 
denote the number of balls that have height at least k at time t, and v>k{t) be 
the number of bins that have load at least k at time t. Note that if a bin has 
load k, it must contain some ball of height at least k. Hence > v>k{t) for 

all times t. Finally, B(n,p) refers to a binomially distributed random variable 
based on n trials each with probability p of success. 

Before giving the proof, we sketch the main ideas. The flavor of the proof 
is to show that with high probability the number of bins containing at least i 
balls is doTibly exponentially decreasing for sufficiently large i. The bound on the 
number of bins containing at least i + 1 balls is obtained from the bound on the 
number of bins containing at least i balls. Establishing the proper conditioning 
between the number of bins with i and i + 1 balls makes the proof challenging. 

A key idea is to avoid seeking a direct bound on the number of bins con- 
taining at least i balls. Rather, following [4], we use the fact that the number of 
balls of height at least i bounds the number of bins containing at least i balls. 
This leads us to obtain bounds on the distribution of ball heights which, with 
high probability, hold for polynomially many steps. A concern is that here the 
adversarial choice of deletions might lead this bound to be too weak. On the 
other hand, the adversary is constrained, for the full sequence of deletions must 
be chosen up front, and this allows the result. 

The key difference between our result on that of [4] is that they hnd a domi- 
nating distribution of heights on one set of n balls, whereas we use a distribution 
that applies to every set of n balls present in the system as it evolves. As it hap- 
pens, the bounds are essentially the same; the most significant changes lie in 
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the end game, where we must bound the number of bins containing more than 
log log n/ log d balls. 

Theorem 1. For any fixed constants ci and C 2 , with probability at least 1 — 
o(l/n‘^i) the maximum load of a bin achieved by process Pd{v) over T = 
steps is log log n/ log d + 0((ci + C 2 )/d) 

Proof. The argument extends the original Theorem 4 of [4], by determining a 
distribution on the heights of the balls that holds for polynomially many steps, 
regardless of which n balls are in the system at any point in time. 

Let £i be the event that iy>i{t) < Pi for time steps t = 1, . . . , T, where the Pi 
will be revealed shortly. We want to show that at time t, 1 < t < T, 

Pr(^>i+i > Pi+i I £i) 

is sufficiently small. That is, given £i, we want £i+i to hold as well. This proba- 
bility is hard to estimate directly. However, we know that since the d choices for 
a ball are independent, we have 



iv^iit — 1)) 

Pr(/i(t) > z -I- 1 I n>i{t - 1)) = 2 ~- 

We would like to bound for each time t the distribution of the number of 
time steps j such that h{j) > i + 1 and the ball inserted at time step j has not 
been deleted by time t. In particular, we would like to bound this distribution 
by a binomial distribution over n events with success probability (Pi/n)'^. But 
this is difficult to do directly as the events are not independent. 

Instead, we fix i and define the binary random variables Yt for t = 1, . . . , T, 
where 

Yt = 1 iff h{t) > z -I- 1 and v>i{t — 1) < Pi- 

(The value Yj is 1 if and only if the height of the ball t is at least z -I- 1 despite 
the fact that the number of boxes that have load at least z is currently below 

Pi-) 

Let u)j represent the choices available to the j’th ball. Clearly 

Pr(Yt = 1 I o;i, . . . . .,Vt-n) < —7 = Pi- 

rz“ 

Consider the situation immediately after a time step t' where a new ball has 
entered the system. Then there are n balls in the system, that entered at times 
zzi, ZZ 2 , . . . , Un- Let I{t') be the set of times {zzi, ZZ 2 , . . . , Un}- Then 

n 

E r, = Er.,i 

tei{t') i=i 

that is, the summation over I(t') is implicitly over the values of Yt for the balls in 
the system at time t'. (This statement differs from the result of [4]; the important 
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point here is that we can bound regardless of what n balls are in the 

system. Note these balls are fixed for any time t by the deletion sequence v.) 
We may conclude that at any time t' <T 



Pr 




< Vi{B{n,pi) > k). 



( 1 ) 



Observe that conditioned on £i, we have Therefore 



Pr(^>i+i(t') >k\£i)=Pr 



E 

(t') 



Yf>k\ Si 



^ PT{B{n,pi) > k) 

Pr(£*) 



( 2 ) 



Thus: 



Since 



Pr(-.£i+i 



c ^ ^ TVi{B{n,pi) > k) 

- Pr(£.) 



Pr(-.£j+i) < Pr(-.£'j+i | £i)Pi{£i) + Pr(-.£’i), 



we have 



Pr(-i5i+i) < TPr{B{n,Pi) > k) + Pr(-i£*). (3) 

We can bound large deviations in the binomial distribution with the formula 
(see for instance [3], Appendix A.) 

Pi{B{n,pi) > epiu) < (4) 

We may then set Pe = ^, and subsequently 

ePf_i 

Pi = ■ , for i > 7. 

fld-L 

Note that the Pi are chosen so that Pr{B{n,pi) > Pi+i) < 

With these choices £q holds with certainty, as there cannot be more than 
n/2e bins with 6 balls. For i > 6, 

Pr(-£i+i) < + Pr{^Si) < + Pr(-£i), 

provided that piU > (ci + C 2 + 1) Inn. 

Let i* be the smallest value for which pi*_in < (ci + C 2 + 1) Inn. Note that 
i* < In Inn/ In d + 0(1). Note that the preceding argument can not be used to 
bound the number of bins with height at least i* , as the Chcrnoff bounds are no 
longer powerful enough; hence we must tackle the tail more directly. This requires 
some careful attention to detail. In fact we will show that, given £i*-i, then with 
probability 1 — o(l/n‘^i), there are no balls with height i* + ((ci+C2+2)/(d— 1)]+1 
over the course of the entire process. 
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Let T\ be the event that (t) < (eci + ec 2 + e) In n for all times t < T. In 
other words, at every time there are not too many bins with height at least i* . 
Then 

Pr(-^i) < Pr(-^i I Pr(5i._i) + Pr(-£i._i), 

and again by (1), (2), and (4) 

Pr(-i^i I fj.-i) Pr(£’j._i) < TPr(_B(n, (ci+C 2 +l) Inn/n) > (eci+ec 2 +e) Inn) < 

Let Ti be the event that < ci + C 2 + 2 for all times t < T. In other 

words, at every time there are no more than a constant nnmber of bins with load 
at least z* + 1. Again, 

Pr(-^2) < Pr(-^2 I ^i)Pr(^i) +Pr(-^i). 

Here 

Pr(-iJp 2 I -^i) Pr(^i) < TPr(_B(n, ((eci + ec 2 + e) Inn/n)*^) > ci + C 2 + 2). 

The binomial expression can be checked to be and hence this last 

term is also 0(l/n'^i"''^). 

Conditioned on we must now show that throughout the process there 
are no bins with load i* + [(ci + C 2 + 2)/(d — 1)] + 1 with sufficiently high 
probability. Let this event be given by Q . Consider any specific time step For 
there to be be any bins with load i* + [(ci + C 2 + 2)/(d — 1)] + 1 at time z, at 
least [ (ci + C 2 + 2)/(d — 1)] of the balls in the system must have landed in bins 
with height at least z* + 1. Hence 

Pr(-a) < Pr(-e I .f2)Pr(^2) +Pr(-.F2). 

Here 

Pr(-e |^ 2 )Pr(.F 2 ) <TPr (H(n, ((ci + C 2 + 2)/rz)^) > [(ci + C 2 + 2)/(d - 1)]) . 

The binomial expression can be checked to be and hence this last 

term is also 0(l/zz'^^“''^). 

To conclude (abusing notation), we have 

P'(-e)<o(^)+Pr(-:Fj) 

/ 1 \ 1 

“ ^ V rz'=i+i ) ^ rz'=i+i ' 

Here the last bound comes from the recurrence (3). That is, Pr(-itz) is dominated 
by the sum of O(loglogn) events each with with probability 0(l/n'^p+^); the 
theorem follows. 



1 

^Ci + l ■ 
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Note that the probability bounds of Theorem 1 can be improved by choosing a 
smaller value of i* , so that the strong Chernoff bounds hold, and then increasing 
the maximum load allowed appropriately. Similarly, we can improve the theorem 
so that a superpolynomial number of steps can be handled, although this requires 
increasing the bound on the maximum load above log log n/ log d + 0(1) to for 
example (1 + o(l)) log log n/ log d. 

3 Adversarial deletions: a witness tree argnment 

In this section, we provide a simple witness tree argument for the balls-and-bins 
problem with deletions and re-insertions. A similar argument appears in [6] for 
the more difhcult problem of routing circuits in a butterfly network. Therefore, 
our result is not new, in that it follows naturally from the argument in [6]. 
Rather, our goal is to present a self-contained and simplified version of the proof 
for the simpler balls-and-bins situation. For convenience, we focus on the case 
d = 2. 

We consider a variation Qd{v,w) of the process Pd{v). Again, the process 
begins with n insertions, followed by alternating insertions and deletions, with v 
specifying the balls to be deleted. We now use w to represent insertions, however. 
We assign each ball an identification number, and without loss of generality we 
assume the first n balls have ID numbers I through n. At time n + j, the ball 
with ID number wj is inserted. If this ball has never been inserted before, then 
it is placed in the least loaded of d bins chosen independently and uniformly at 
random. If the ball has been inserted before, it is placed in the least loaded of 
the d bins chosen for its first insertion - that is the bin choices of a ball are fixed 
after it is first inserted in the system. We assume that v and w are consistent, 
so there is only one ball with a given ID number in the system at a time. Note 
also that v and w must be chosen by the adversary before the process begins, 
without reference to the random choices made during the course of the process. 

This scenario appears when, for example, we use a (random) hash function 
for the two bin choices of every ball. As before, when a ball is (re-)inserted, the 
algorithm places the ball in the bin with the smaller load. 

The main theorem of this section is stated below. The constants of the the- 
orem have been chosen for convenience and have not been optimized. Note that 
the techniques used to prove this theorem can be generalized to show that if 
each ball makes d bin choices for some constant d, then the maximum load of 
any bin is O (log log n) with high probability. The result can also be extended for 
non-constant d as well. 

Theorem 2. At any time t, with probability at least 1 — the max- 

imum load of a bin achieved by process Q 2 {v,w) is dloglogn. 

Proof. We prove the theorem in two parts. First, we show that if there is a bin r 
at time t with 4i balls, where £ = log log n, there exists a degree £ pruned witness 
tree. Next, we show that, with high probability, no degree £ pruned witness tree 
exists. 
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Constructing a witness tree. In a witness tree, each node represents a bin 
and each edge (ri,rj,C) represents a ball that was inserted at time tf. whose 
two bin choices are ri and rj. Suppose that some bin r has load M at time t. 
We construct the witness tree as follows. The root of the tree corresponds to 
bin r. Let 6 i, . . . , 64^ be the balls in r at time t. Let be the other bin choice 
associated with ball bi (one of the choices is bin r). The root r has M children, 
one corresponding to each bin Let U < the the last time bi was (re-)inserted 
into the system. Without loss of generality, assume that t\ < t 2 < ■ ■ ■ < tu- Note 
that the height of ball bi when it was inserted at time ti is at least i. Therefore, 
the load of bin r^, the other choice of bi, is at least i — 1 at time ti. We use this 
fact to recursively grow a tree rooted at each r^. 

The witness tree we have described is irregular. However, it contains as a 
subgraph an Cary tree of height £ such that 

— The root in level 0 has £ children that are internal nodes. 

— Each internal node on levels 1 to £ — 2 has two children that are internal 
nodes and £ — 2 children that are leaves. 

— Each internal node on level £ — 1 has £ children that are leaves. 

For convenience we refer to this subtree as the actual witness tree henceforth. 
Constructing a pruned witness tree. If the nodes of the witness tree are 
guaranteed to represent distinct bins, proving our probabilistic bound is a rel- 
atively easy matter. However, this is not the case; a bin may reappear several 
times in a witness tree, leading to dependencies that are difficult to resolve. This 
makes it necessary to prune the tree so that each node in the tree represents 
a distinct bin. Consequently, the balls represented by the edges of the pruned 
witness tree are also distinct. In this regard, note that a ball appears at most 
once in a pruned witness tree, even if it was (re-)inserted multiple times in the 
sequence. 

We visit the nodes of the witness tree iteratively in breadth-first search order 
starting at the root. As we proceed, we remove (i.e., prune) some nodes of the 
tree and the subtrees rooted at these nodes - what remains is the pruned witness 
tree. We start by visiting the root. In each iteration, we visit the next node v in 
breadth-first order that has not been pruned. Let B{v) denote the nodes visited 
before v. 

— If u represents a bin that is different from the bins represented by nodes in 
B{v), we do nothing. 

— Otherwise, prune all nodes in the subtree rooted at v. Then, we mark the 
edge from v to its parent as a cutoff edge. 

Note that the cutoff edges are not part of the pruned witness tree. The procedure 
continues until either no more nodes remain to be visited or there are £ cutoff 
edges. In the latter case, we apply a final pruning by removing all nodes that are 
yet to be visited. The tree that results from this pruning process is the pruned 
witness tree. After the pruning is complete, we make a second pass through 
the tree and construct a set C of cutoff balls. Initially, C is set to 0. We visit 
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the cutolT edges in BFS order and for each cutoff edge {u, v) we add the ball 
corresponding to (u, v) to C, if this ball is distinct from all balls currently in C 
and if \C\ < [p/2], where p is the total number of cutoff edges. 

Lemma 1. The pruned witness tree constructed above has the following proper- 
ties. 

1. All nodes in the pruned witness represent distinct bins. 

2. All edges in the pruned witness tree represent distinct balls. (Note that cutoff 
edges are not included in the pruned witness tree.) 

3. The cutoff halls in C are distinct from each other, and from the balls repre- 
sented in the pruned witness tree. 

4 . There are [p/2] cutoff balls in C , where p is the number of cutoff edges. 

Proof. The first three properties follow from the construction. We prove the 
fourth property as follows. Let 6 be a ball represented by some cutoff edge, and 
let V and w be its bin choices. Since v and w can appear at most once as nodes in 
the pruned witness tree, ball b can be represented by at most two cutoff edges. 
Thus, there are [p/2] distinct cutoff balls in C. 

Enumerating pruned witness trees. 

We bound the probability that a pruned witness tree exists by bounding both 
the number of possible pruned witness trees and the probability that each such 
tree could arise. First, we choose the shape of the pruned witness tree. Then, 
we traverse the tree in breadth-first order and bound the number of choices for 
the bins for each tree node and the balls for each tree edge; we also bound the 
associated probability that these choices came to pass. Finally, we consider the 
number of choices for cutoff balls in C and the corresponding probability that 
they arose. Multiplying these quantities together yields the final bound - it is 
important to note here that we can multiply term together only because all 
the balls and the bins in the pruned witness tree and the cutoff balls in C are 
distinct. 

Choosing the shape of the pruned witness tree. Assume that there are p cutoff 
edges in the pruned tree. The number of ways of selecting the p cutoff edges is 
at most 

i2o€\ 

I < 

P J 

since there are at most £^2^ nodes in the pruned witness tree. 

Ways of choosing balls and bins for the nodes and edges of the pruned witness 
tree. The enumeration proceeds by considering the nodes in BFS order. The 
number of ways of choosing the bin associated with the root is n. Assume that 
you are considering the internal node Vi of the pruned witness tree whose 
bin has already been chosen to be rj. Let Vi have Si children. We evaluate the 
number of ways of choosing a distinct bin for each of the Si children of Vi and 
choosing a distinct ball for each of the Si edges incident on Vi and weight it by 
multiplying by the appropriate probability. We call this product Ei. 
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There are at most (^) ways of choosing distinct bins for each of the Si children 
of Vi- Also, since there are at most n balls in the system at any point in time, 
the number of ways to choose distinct balls for the Si edges incident on Vi is 
also at most (^). (Note that the n balls in the system may be different for each 
Vi; however, there are still at most (^) possibilities for the ball choices for any 
vertex.) There are <5^! ways of pairing the balls and the bins, and the probability 
that a chosen ball chooses bin and a specific one of Si bins chosen above is 
2/n^. Thus, 




( 5 ) 



Let m be number of internal nodes Vi in the pruned witness tree such that 
Si = t. Using the bound in Equation 5 for only these m nodes, the number of 
ways of choosing the bins and balls for the nodes and edges respectively of the 
pruned witness tree weighted by the probability these choices occurred is at most 
n • ( 2 VU)™. 



Ways of choosing the cutoff balls in C. Using Lemma 1, we know that there 
are [p/ 2 ] distinct cutoff balls in C. The number of ways of choosing the balls 
in C is at most n , since at any time step there are at most n balls in the 
system to choose from. Note that a cutoff ball has both its bin choices in the 
pruned witness tree. Therefore, the probability that a given ball is a cutoff ball 
is at most 







Thus the number of choices for the [p/2] cutoff balls in C weighted by the 
probability these cutoff balls occurred is at most 



< (^422£/^)rp/21. 



Putting it all together. The probability at time t of there existing a pruned 
witness tree with p cutoff edges, and m internal nodes with £ = log log n children, 
is at most 



• n ■ ( 2 V£!)™ ■ (£^ 227 n)^P /21 < ^ . (£ 82 ^Vn) 

< n- ( 2 e/ loglogn)’”*°®*°®” • (loglog® nlog^ n/n) . ( 6 ) 



Observe that either the number the cutoff edges, p, equals £ or the number of 
internal nodes with I children, m, is at least 2^“^ = logn/4. Thus, in either 
case, the bound in Equation 6 is Eurther, since there are at most 

1^2^ values for p, the total probability of a pruned witness tree is at most £^ 2 ^ • 
l/j^i 7 (iogiogn) jg ]^y/j^i 2 (iogiogn)^ TMs Completes the proof of the theorem. 



4 Deletions based on item age 

We now consider an alternative scenario, where items are deleted in order of 
their insertion time. We define a process based on phases: in the first phase, 
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there are n insertions, where again all insertions are made by putting a ball into 
the least loaded of d bins chosen independently and uniformly at random. In 
subsequent phases, there are first n insertions, and then the items inserted in 
the previous phase are deleted. 

One way to view this process is as a modified version of the process Pd{v) 
in the case where v = (1,2,3,...). The difference here is that deletions and 
insertions do not alternate; instead, deletion requests are batched and acted 
on at the end of a phase. (Unfortunately, we do not yet have an argument 
showing bounds no this modified version of Pd{{l,2, . . .)) necessarily also hold 
on the original process, even though it seems natural to conclude that batching 
deletions until a later time can only worsen performance.) 

This phase-based system allows us to regard the state of each bin as a two- 
dimensional variable. A bin is said to be in state (i,j) if it has i balls that will 
be deleted in the next deletion phase and at least j balls that have been inserted 
in the current insertion phase. Such two-dimensional models have previously 
proven useful for dynamic variations of load balancing problems [11]. We prove 
bounds for this phased-based system; for convenience we consider only the case 
d = 2. 



Theorem 3. For any fixed constant c, the maximum load of a bin achieved 
by process /^^(li^,...) over T = steps is loglogn/log2 -|- 0(1) with high 
probability. 



Proof (Sketch). We vary Theorem 1 so that it works on phases. That is, consider 
a time interval of n insertions preceding a deletion phase. Let Xij{t) be the 
number of bins with i balls that will be deleted and at least j newly inserted 
balls after the tth insertion. Note that Xij (O) = 0 unless j = 0. 

Our goal is to show there is a simple “stable” bounding distribution with 
the following property: if we begin with Xj^o(O) < (3i for some appropriate se- 
quence /3i, then after n insertions and n deletions, we again have Xi^o(O) < (3i 
with suitably high probability. This will imply that the process continues for 
polynomially many steps before the load becomes too high, assuming we begin 
properly. 

Suppose 






, , OiTl 9 

0 < — 7 ^ 

i 



for sufficiently large i > L, where L and 7 are suitable constants. (For example, 
we may take L = 20, a = 1/20, and 7 = (1 — ^); note 7 ^ w i.) It can be 
checked that this condition holds after the first n insertions in a straightforward 
manner. We will show that 



Xi,j(n) < 



an 

i^j 



2^+3 

7 



for i + j > L with high probability. 

Further, let Xi^ = Xk^i{n); Xi^o is just the number of bins with i balls 

after the insertions and deletions complete. We will also show that 
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for i > L with high probability; this will allow us to continue the process for 
polynomially many steps. (Actually, technically we only require these conditions 
hold for up to the point where i + j is log log n/ log 2 + 0(1), so that there are 
still l7(logn) bins of this height. Once the number of bins at a state becomes 
sufficiently small so that Chernoff bounds no longer apply, we must use a more 
explicit tail argument, as in Theorem 1. This affects the 0(1) term of the theo- 
rem, which depends on c. We skip these details here.) 

We prove this inductively on i + j in a similar fashion to the induction in 
Theorem 1. Define 



ViJ 



def a 2 *+^ 

i+j' 



Let £g be the event that Xij{n) < yij for all i + j = g- Now for a fixed pair 
{i,j -b 1) with i + j = g consider a series of binary random variables Yt for 
t = 1, . . . ,n, where Yt = 1 iff the tth ball lands in a bin in state {i,j -I- 1) (after 
its entry) and £g. (The value Y) is 1 if the height of the ball t is at least i+j + 1 
and i balls in its bin are to deleted, despite the fact that the number of such 
bins has not grown too large.) 

Let LOi represent the choices available to the ith ball. Then 



Pr(Y) 



<vlj + 2 ^ yijVk,! + 2 yijVkfi = Pi,j- 

k-^l=g k~>g 



We simplify the above expression: 



( 2 2*+J+^ \ / 

TYFI h+2(.+j) + 2y;i 



k>g 



< So; 



ay 



i + j 



In the last inequality, we bounded the last summation using the specified value 
for 7 and the fact that g > L. 

By the Chernoff bound P* — hkiyij with probability at least 1 — 1/ 
as long as Pij > (2c -1-2) Inn. But here Yj = Xij{n) conditioned on £g. 

Hence, as long as z -I- j = o(logn). 



Pr(-£g+i \£g) = o • 



Similarly, conditioned on all £g for g > L, for i > L + I, 



Xifi = ^ Xk,i{r 



k>0 



aj 



< V ^ 



k>0 



2(i + k) 



(y. 9"i 

< -7^ • 

I 
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Hence, with high probability, the inductive argument goes through, handling 
all levels of height at least L + 1. As the probability of failure is ) in any 

phase, we can go through phases before a failure with high probability. 

Note, however, than in using the assumption that the previous phase was well 
bounded to obtain the bound for the next one, we have “lost” the appropriate 
bound for Xl,o- This is because to bound Xi^o(O) we would need to have an 
initial bound on Xi_i,o(0), which we lack. This problem can be easily handled 
by noting that the number of bins with at least L balls at any point in the 
process is stochastically dominated by the number of bins with at least L balls 
when 4n bins are thrown into the n bins; this is because we could consider 
the distribution when each of the at most 2n balls in the system has a twin, 
and a ball or its twin goes into each of the two bins a ball chooses from. This 
distribution clearly dominates the distribution present in the system. For L = 20 
and the parameters chosen we can conclude that at the beginning of every phase 
£l holds with probability exponentially small in n, so the inductive proof goes 
through. 

As for Theorem 1, the probability of failure can be made lower by increasing 
the bound on the maximum load appropriately. 
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Abstract. Suppose we sequentially throw m balls into n bins. It is a 
natural question to ask for the maximum number of balls in any bin. 
In this paper we shall derive sharp upper and lower bounds which are 
reached with high probability. We prove bounds for all values of m{n) > 
n/polylog(n) by using the simple and well-known method of the first and 
second moment. 



1 Introduction 

Suppose that we sequentially throw m balls into n bins by placing each ball into 
a bin chosen independently and uniformly at random. It is a natural question 
to ask for the maximum number of balls in any bin. This very simple model has 
many applications in computer science. Here we name just two of them: 

Hashing: The balls-into-bins model may be used to analyze the efficiency of 
hashing- algorithms. In the case of the so called separate chaining, all keys that 
hash to the same location in the table are stored in a finked fist. It is clear that 
the lengths of these fists are a measure for the complexity. For a well chosen 
hash-function (i.e. a hash-function which assigns the keys to all locations in the 
table with the same probability) , the lengths of the lists have exactly the same 
distribution as the number of balls in a bin. 

Online Load Balancing: With the growing importance of parallel and distributed 
computing the load balancing problem has gained considerable attention during 
the last decade. A typical application for online load balancing is the following 
scenario: consider n database- servers and m requests which arise independently 
at different clients and which may be handled by any server. The problem is to 
assign the requests to the servers in such a way that all servers handle (about) 
the same number of requests. 

Of course, by introducing a central dispatcher one can easily achieve uniform 
load on the servers. However, within a distributed setting the use of such a 
central dispatcher is highly undesired. Instead randomized strategies have been 
applied very successfully for the development of good and efficient load balancing 
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algorithms. In their simplest version each request is assigned to a server chosen 
independently and uniformly at random. If all requests are of the same size the 
maximum load of a server then corresponds exactly to the maximum number of 
balls in a bin in the balls-into-bins model introduced above. 



1.1 Previous results 

Balls and bins games have been intensively studied in the literature, cf. 
e.g. [JK77]. The estimation of the maximum number of balls in any bin was 
originally mainly studied within the context of hashing functions. In particu- 
lar, Gonnet [GonSl] determined for the case m = n that the expected num- 
ber of balls in the bin containing the maximum number of balls is + 

O ^ iogr-i(n) })- check that Gonnet’s result implies that the maximum 

load of any bin is with high probability iog°og„ (l + o(l)). In his dissertation 
Mitzenmacher [Mit96] also included a simpler proof of the fact that the max- 
imum load is 0( iog°fog„ )- He also obtains some results for the case m < n/logn. 
For the case m > nlogn it was well known that the maximum load of any bin 
is 0(^), *.e. of the order of the mean. However, the precise deviation from the 
mean seems not to have been studied before. 

We note that for the online load balancing also different models of balls into 
bin games have been studied. We note in particular the approach of Azar et 
al. [ABKU92]. They study the following model: each ball picks d bins uniformly 
at random and places itself in those bin containing fewest balls. For the case 
m = n [ABKU92] showed that in this model the maximum load of any bin drops 
exponentially from to *°f^*°|” (l + o(l)). Compare also Czumaj 

and Stemann [CS97] for more results in this direction. 

1.2 Our results 

In this paper we apply the first and second moment method, a well-known tool 
within the theory of random graphs, cf. e.g. [Bol85], to obtain a straightforward 
proof of the fact that the maximum number of balls in a bin is +o(l)) 

for m = n with probability 1 — o(l). 

Besides being a lot more elementary than Gonnet’s proof method the big advan- 
tage of our method is that it also easily generalizes to the case where m ^ n balls 
are placed into n bins In particular, this allows to also analyze the case n, 
which can neither be handled by Gonnet’s approach nor by AIitzenmacher’s. 
(Both are based on approximating the Binomial distribntion B(m, i) by a Pois- 
son distribution, which only gives tight bounds if m • ^ is a constant.) The 
case m ^ n is particnlarly important for the load-balancing scenario mentioned 
above. Here it e.g. measures how the unsymmetry between different servers grows 
over time when more and more requests arrive. 

Our results are summarized in the following theorem: 
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Theorem 1. Let M be the random variable that counts the maximum number 
of balls in any bin, if we throw m balls independently and uniformly at random 
into n bins. Then Pr[M > ka] = o(l) if a > 1 and Pr[M > ka] = 1 — o(l) if 
0 < a <1, where 




log n 

log^ 



1 + a 






log 



n log n 



)■ 



{dc - 1+a) logn, 



f +ay2^1ogn, 




log(^) n \ 

a 2 logn y’ 



polylog(n) < ^ 

if m = c - nlogn for some constant c, 
if nlogn < n- polylog(n), 

ifm^n- (logn)^. 



Here dc denotes a suitable constant depending only on c, cf. the proof of 
Lemma 3. 



The paper is organized as follows: in § 2 we give a brief overview of the hrst 
and second moment method, in § 3 we show how to apply this method within 
the balls-into-bins scenario and obtain in § 4 the ipg fog„ (1 + o(l)) bound for 
m = n. In § 5 we then present some more general tail bounds for Binomial 
random variables and combine them with the hrst and second moment method 
to obtain a proof of Theorem 1. 



1.3 Notations 

Throughout this paper m denotes the number of balls and n the number of bins. 
The probability that a ball is thrown into a hxed bin is given by p := 1/n. We 
dehne qhy q := I — p. We shall denote the iterated log by log^'\ i.e. log^^^ x = 
log a; and log*^*'*'^^ x = log(log^*^ x) for all A: > 1. In this paper logarithms are to 
the base e. 

Asymptotic notations (O (•), o(-) and w(-)) are always with respect to n; / <C 5 
means / = o{g) and f ^ g means / = io{g). We use the term polylog(a:) to 
denote the class of functions Ua>i O ((logx)^). We say that an event £ occurs 
with high probability if Pr [£■] = !- o(l). 

2 The first and second moment method 

Let X be a non-negative random variable. Then AIarkov’s inequality implies 
that Pr [X > 1] < E [X] . Hence, we have 

E[X]=o(l) ^ Pr[X = 0] = 1 -o(l). 

Furthermore, Chebyshev’s inequality implies that 

Pr [.Y = 0] < P, [|X - E [.Y] I > E [.Y]] < ^ 



( 1 ) 
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Hence, in order to show that Pr [X = 0] = o(l) we just have to verify that 

E[X2] = (1 + o(1))(E[X])2. (2) 

While it is often quite tedious to verify (2), it is relatively easy if we can write X 
as the sum of (not necessarily independent) 0-1 variables Xi, . . . , X„ that satisfy 

E [Xj] = E [Xi] and E [X^X^] < (1 -I- o(l))(E [Xi])^ VI < i < j < n. (3) 

Then 



E [X^]=E 



/ n \ ^ 






= E 


\i=l / 








=Xi 



<E[X] + (1+o(1))(E[X])2 



and we can combine (1) and (2) to obtain 
Pr [X = 0] = 



l-o(l), ifE[X]=o(l), 
o(l), if E [X] — >• cxD. 



(4) 



This is the form which we will use for the analysis of the balls and bins scenario. 



3 Setup for the analysis 



Let Yi = Yi{m,n) he the random variable which counts the number of balls in the 
ith bin if we throw m balls independently and uniformly at random into n bins. 
Clearly, L) is a binomially distributed random variable: we express this fact by 
writing 1) ~ il(m, 1/n) respectively Pr[L) = k] = b{k;m,l/n) := (™)(^)*(1 — 
L^m-k_ _ Xi{m, n, a) be the random variable, which indicates if 1) is at 

least ka = ka(m,n) (the function from Theorem 1) and let X = X{m,n,a) be 
the sum over all Xj’s, i.e.: 



n 

X := ^ Xi and X* 
2=1 



ri, if Yi>ka, 
\ 0, otherwise. 



Clearly, 

E [Xj] = Pr [B(m,l/n) > ka] 



for alH = 1, . . . ,n. 



and 

E [X] = n • Pr [H(m, 1/n) > A:^] . (5) 

In order to apply (4) we therefore need good bounds for the tail of the binomial 
distribution. Before we obtain these for the general case of all m = m{n) we 
consider the special case m = n. 
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4 The case m = n 



The aim of this section is to present a self contained proof of the fact that if 
m = n the maximum number of balls in a bin is + o(l)) with high 

probability. We will do this by showing that 



Pr 



3 at least one bin with > a , balls 



l-o(l), ifO<ct<l, 



|o(l), 



if a > 1. 



(6) 



Note that the claim of equation (6) is slightly weaker than the corresponding 
one from Theorem 1. We consider this case first, as here the calculations stay 
slightly simpler. So, in the rest of this section we let ka '■= 

Recall from § 2 that in order to do this we only have to show that condition (3) 
is satisfied for the random variables Xi introduced in the previous section and 
that 

foo, if 0 < a < 1, 



E[X] =n-Pr > a 



logn 



nj 



0, if a > 1. 



(7) 



The fact that E [Xi] = E [Xi] for all 1 < i < u follows immediately from the 
definition of the Xj’s. The proof of the second part of (3) is deferred to the end 
of this section. Instead we start with the verification of (7). For that we prove a 
small lemma on the binomial distribution. We state it in a slightly more general 
form then necessary, as this version will be helpful later-on. 

Lemma 1. Letp = p{m) depend on m. Then for all h>l 

Pi[B{m,p) > mp + h] = ^1 -1-0 ' b{mp+ h;m,p). 

Proof. Observe that for all k > mp + h: 

b{k+l]m,p) _ {m — k)p {{l—p)m — h)p 



b{k-,m,p) {k + l){l—p) {mp + h + 1){1 — p) 
One easily checks that A < 1 for /i > 1. Thus 

1 



=: A. 



b{k; m,p) < b{mp + h;m,p) • A^ 

k>mp-\-h 



i>0 



l-A 



b{mp + h;m,p). 



As < 1 + ^ the claim of the lemma follows. 



□ 



We apply Lemma 1 for “m” = n, “p” = ^ and “mp + h” = ka- Subsequently, 
we use Stirling’s formula a:! = (1 -I- o(l))V27rxe“®a:® to estimate the binomial 
coefficient. Together we obtain: 



E [X] = n • Pr [B{n, ^) > fej = n • (1 -|- o(l)) • b{ka\n, i) 

kci / "1 \ Picket 

/ n \ / I \ 

■ n-{l + o{l))( 



1 -- 

n 
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= n ■ n+log<'*> n+o(l)) 

= „l-a+o(l)^ 

which implies the statement of equation (7). 

To complete the proof for the case m = n we still have to verify that E [XiXj\ < 
(1 + o(l))(E [Xi])"^ for all i / j. In order to keep the proof elementary we shall 
proceed similarly as in the proof of Lemma 1. A more elegant version can be 
found in § 5. 



E [XiXj\ = Pr \Xi >kaA Yj > ka] 

n—kct n—ki 

= E E 

k~i — kci k^ — ko 



n\ (n — k\ 



ki 



k2 






n-(ki+k 2 ) 



< E E 

ki=ka k2=ka 



n \ j n 
ki) \k 2 j \n 



< 



<a;) <(i-i) 

2 ^ fcl+^2 

H 2 



2 \ 2 n— 2 (fci+^ 2 ) 

^ n) 



1 / 1 \ 



i=0 J 



where A is defined as A := fc(fc°^^,T)(‘i!!i)fe2+i • As A = o(l) and b(ka\n,^) = 
(1 + o(l))E[Afi] (cf. Lemma 1) this concludes the proof of (6). 



5 The general case 

For the proof of Theorem 1 we will follow the same pattern as in the proof 
of the previous section. The main difference is that in varions parts we need 
better bounds. We start by collecting some bounds on the tails of the binomial 
distribution. 



5.1 Tails of the binomial distribution 

The binomial distribution is very well studied. In particular it is well-known that 
the binomial distribution B(m,p) tends to the normal distribntion if 0 < p < 1 
is a fixed constant and m tends to infinity. If on the other hand p = p(m) 
depends on m in such a way that mp converges to a constant A for m tending 
to infinity, then the corresponding binomial distribntion B{m,p) tends to the 
Poisson distribntion with parameter A. For these two extreme cases also very 
good bounds on the tail of the binomial distributions are known. In the context 
of our “balls and bins” scenario, however, we are interested in the whole spectrum 
of values p = p(m). 
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In this section we collect some bounds on the tails of the binomial distribution 
which are tailored to the proof of Theorem 1. 

For values of p(m) such that mp tends to infinity one can analyze the 
proof of the theorem of DeMoivre-Laplace to get asymptotic formnlas for 
Pr [B(m,p) > mp+ K\ for all values h that are not “too“ large: 

Theorem 2 (DeMoivre-Laplace). Assume 0 < p < 1 depends on n such 
that pqm = p(l — p)m ->■ oo for m ->■ oo. // 0 < /i = x{pqmY^'^ = o((pqm)^''^) 
and X ^ 00 then 

1 

Pr [ii(m,p) > mp + h] = (1 + o(l)) • — =e 2 . 

X\/‘1'K 

For an explicit proof of this version of the DeMoivre-Laplace Theorem see 
e.q. [Bol85]. 

The probability that a binomial distributed random variable B{m,p) obtains 
a value of size at least mp(l + e) for some constant e > 0 is nsnally estimated 
using the so-called Chernoff bounds. Recall, however, that Chernoff bonnds 
provide only an upper bound. 

With the help of Lemma 1 in the previous section we are now in the position to 
prove the tail bounds for those special cases of the binomial distribution which 
we will need further-on. 



Lemma 2. a) Ifmp -f 1 < t < (logm)^, for some positive constant £, then 
Pi [B(m,p) > t]= g*('°g™p-'°s*+i)-™p+o(iog*^* 

h) Ift = mp + o {ipqm)'^'\ and x := tends to infinity, then 



Proof a) 



Pr [R(m,p) >t] = e-#->°g®-i>°g27r-ro(i)_ 
Using Stirling’s formula a:! = (1 -I- o(l))\/27rxe“® 



b(t]m,p) = (1 -I- 0 ( 1 )) 



1 /mpY 

Vm V t ) 



1 + 



t — mp 
m — t 



x^ we obtain: 

. m—t 



Together with Lemma 1 we thns get for log Pr[R (m,p) > t] the following ex- 
pression: 

log(^H-0 -l-f(logmp-logt-Pl)-mp- ^ + 0 -0(1) 

The term O j gets arbitrarily small if mp < t = o(ytm) becanse 

< m(i-o{i)) = ^(1)- assumption mp -|- 1 < (logm)^. That is, 
log (1-1-0 (mp)) = O (log log m). 

b) This case is simply a reformulation of the DeMoivre-Laplace theorem. □ 
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5.2 Proof of Theorem 1 

We follow the setup outlined in § 2. That is, we only have to show that the 
variables X{ satisfy condition (3) and that the expectation of X = ^ Xj tends 
either to infinity or to zero depending on the fact whether a is smaller or greater 
than 1. We start with the later. 



Lemma 3. Let ka be defined as in Theorem 1. Then 
log E [X] — y 

for all values m = m{n) > n/polylog(n). 



oo, i/0 < a < 1, 
—00, i/o > 1, 



Proof The case po]yk>g(n) <m<^nlogn. 

We first note that it suffices to consider non-negative a’s. Assume that m = 
where g = g{n) tends to infinity arbitrarily slowly and g{n) < polylog(n). 

Then 



ka 



logrt / 

logg I log 5 



From equation (5) and Lemma 2 case a) (we leave it to the reader to verify that 
this case may be applied) it follows that 



logE [X] = logn -f ka n - log <7 — logfca + l) ~ Q ^log^^^ 



logn 

logs 



^logs + 




= (1 - a -f o(l)) 



logn • log^^l g 
logs 



which yields the desired result. 

The case m = c - n log n. 

Let ka := {dc — 1 + a). By Lemma 2 we get: 



log E [X] = log n{l + {dc-l+a) (log c - log (dc - 1 + a) + 1) - c -I- o(l)) . 

As a consequence, for a = 1 logE[X] is exactly then o(logn) when dc is a 
solution of 

fc{x) :=l+x (logc — log a; -I- 1) — c = 0. 

For all c > 0 this equation admits exactly two real zeros x\, X 2 - One of these 
solutions is smaller than c and is therefore not the one we are looking for. That 
is, we define dc as the (unique) solution of fc{x) = 0 that is greater than c. In 
the neighborhood of the solutions x\ and X 2 , fc{x) changes its sign. This means 
that for dc — 1 +a for a given a > 1, log E [X] tends to — oo, whereas for dc — 1 -fa 
for an 0 < a < 1, logE [X] tends to oo. 
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The case nlogn <^m <n - polylog(n). 

Assume that m = gnlogn where g = g{n) < polylog(n) tends to infinity arbi- 
trarily slowly. Then 

ka = glogn ^1 -|- a 
Prom Lemma 2 case a) it follows that 




log E [X] = log n -I- ka ^log g + log^^^ n - log fca -I- 1 j - gf log n -I- O ^log^^^ n j 



1 



9 



= glogn - -f 1 -l-aW- 1-aW- -I- — + o- -l-l-o- 






9 9 



= log n (l — -I- o(l)) . 



One easily to checks, that we didn’t hurt the conditions of Lemma 2. 

The case m > n(logn)^. 

For this case we shall use the theorem of DeAIoivre-Laplace. Recall that in 
this case 

log^^^ n 
21ogn 

Using the notations of Lemma 2 case b) we set 



, m 

ka = h 

n 



\ 



2m log ^ I 2 ^ 



X := 



ka — mp 



^/Wn \ 



2 log n 1 — 



1 log^^^ n 
2a log n 



1 + 



n — 1 



Applying DeAIoivre-Laplace we obtain: 



logE [X] = logn — 
= log^^^ n 



Tog a: — log + o(l) 






We still need to check that we didn’t^ violate the conditions of DeAIoivre- 
Laplace, i.e. that ka~ ^ = o (ipqm)^j, but this is true if ^ = lo (log^n). 
□ 



In order to show that the variables Xi satisfy the second part of condition (3) 
(note that the first part is trivially true) we start with two simple lemmas. 

Lemma 4. Let p < j and m be such that jPm = o(l). Then 
Pr [i?(m(l-p),i^)>f] 



< (1 -f o(l)) • Pi [B{m,p) > t] for allO <t <m. 
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Proof. We will show that for all 1 < t < m we have — p), < (1 + 

o{l))b{t;m,p). Clearly, this then completes the proof of the lemma. So consider 
an arbitrary, but fixed 1 < f < m. For t > m{l—p) we have &(t; m(l— p), = 0 

so we might as well assume that t <m(l— p). Then 



-P), 



P 



m{l-p)\ ( p 



< 



n 

2=0 



m(l - p) - i ( p 



m — i 



l-p 
t 



P 



l-p 



m(l— p)— t 



l-p 



1 - 



p 



l-p 



m(l— p) 



< 1— p 



< (l-p)” 



l-p 

1- 

^ l-p / 



= b{t-,m,p)-(l + r^^ 

' V 

g2p^Tn 

= b{t\m,p) ■ (1 + o(l)). 



1 - 



P 



l-p 



-t 



□ 



Lemma 5. Letp = o(l) and m,t be such that x := / satisfies a; — >■ oo, 

X = o((mp(l — p))^/®) and xp = o(l). Then 



Pr .B(m(l -p), Y^) > t < (1 + o(l))Pr [.B(m,p) > t] . 



Proof. Observe that the assumptions of the lemma are such that we may apply 
case b) of Lemma 2 to compute Pr [il(m,p) > t]. Observe furthermore that we 



may also apply this case of Lemma 2 to bound Pr il(m(l -p), > t 



as 



here the corresponding a;- value is 



X = 



t — m(l — p) 



_E_ 

l-p 



^m(l -p) • • (1 - 



t - mp I l-p _ p^ 

Vmp{l-P)'\l^^p~''\ 



Together we deduce 



Pr 



.B(m(l -p),-^^) > t 
l-p 



= g-^ (l + T^)-'og*-2 log(l + T?lF)-2 log27T+o(l) 

= Pr [.B(m,p) > t] ■ g-o(p"*")-o(p")+o(i) 

= Pr [il(m,p) > f] • (1 + o(l)). 



□ 
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Corollary 1. Let m = m{n) and p = ^ be such that m > logn, and let ka 
denote the value from Theorem 1. Then 

Pr \B(m{l-p), > A:„j < (1 +o(l)) •Pr[B(m,p) > ka] . 

Proof. One easily checks that for all m = o(n^) Lemma 4 applies and that for 
all m ;2> n(logn)^ Lemma 5 applies. □ 

Lemma 6. Let Xi,.. be defined as in § 3. Then for alll <i < j <n 

E[XiXj]<{l + o{l))-{E[X,]f. 

Proof. Using the notation from § 3 we have 

E [XiXj] = Pr [Yi >kaA Yj > ka] 



= E 



fcl +fc2 

fcl,fc2>fca 



m—ka 



(l-2p) 



m—k\ —k2 



m—k\ 

= E (□/‘(i-P)”-*'- E 

kl=k^ ^ k2 = kc 

m—kc 



/m — ki\ 




1 - -2-) 


V k2 ) 


li-nj V 


i-pj 



m—ki—k2 



lll — Ka / \ 

= E -Pr > A:„] 



As ki> ka > mp we observe that 



Pr 



B{m - ki, > ka 



< Pr 



.B(m(l -p), 1 ^) > ka 



= (1 +o(l)) • Pr [.B(m,p) > ka] , 
where the last equality follows from Corollary 1. Hence, 

m—k g , X 

E[XiXj] = ^ i^^Jp'^^il-pr-'^^ ■il+o{l))-Pv[B{m,p)>ka] 



m—kcx / \ 

= (1 + o(l)) •Pr[H(m,p) > fco,] • ^ ™ 



< Pr[B(m,p)>ka] 

< (1 + o(l)) • (Pr [B(m,p) > ka]f . 

As Pr [B{m,p) > ka] = E [Xi], this completes the proof of the lemma. 



□ 
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6 Conclusion 

In this paper we derived an asymptotic formnla for the maximum number of 
balls in any bin, if m = m(n) balls are thrown randomly into n bins, for all 
values of m > n/polylog(n). Our proof is based on the so-called first and second 
moment. 

The result for m = n was well-known before. However, our method gave a much 
simpler proof compared to those which were previously available in the literature. 
To the best of our knowledge the result for the case m ^ n is new. In our opinion 
it is a challenging open problem to study the behavior of the modified balls into 
bins scenario as introduced in [ABKU92] for the case m > n as well. Intensive 
computational experiments seem to indicate that in this case the difference of 
the maximum load in any bin from the mean m/n should be independent of m. 
We intend to settle this problem in a forthcoming paper. 
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Abstract. We introduce Tornado codes, a new class of erasure codes. 
These randomized codes have linear-time encoding and decoding algo- 
rithms. They can be used to transmit over lossy channels at rates ex- 
tremely close to capacity. The encoding and decoding algorithms for 
Tornado codes are both simple and faster by orders of magnitude than 
the best software implementations of standard erasure codes. We expect 
Tornado codes will be extremely useful for applications such as reliable 
distribution of bulk data, including software distribution, video distribu- 
tion, news and financials distribution, popular web site access, database 
replication, and military communications. 

Despite the simplicity of Tornado codes, their design and analysis are 
mathematically interesting. The design requires the careful choice of a 
random irregular bipartite graph, where the structure of the irregular 
graph is extremely important. We model the progress of the decoding 
algorithm by a simple AND-OR tree analysis which immediately gives 
rise to a polynomial in one variable with coefficients determined by the 
graph structure. Based on these polynomials, we design a graph structure 
that guarantees successful decoding with high probability. 
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Abstract. Redundancy has been utilized to achieve fault tolerant com- 
putation and to achieve reliable communication in networks of proces- 
sors. These techniques can only be extended to computations solely based 
on functions in one input in which redundant hardware or software 
(servers) are used to compute intermediate and end results. However, 
almost all practical computation systems consist of components which 
are based on computations with multiple inputs. Wang, Desmedt, and 
Burmester have used AND/OR graphs to model this scenario. Roughly 
speaking, an AND/OR graph is a directed graph with two types of ver- 
tices, labeled A-vertices and V -vertices. In this case, processors which 
need all their inputs in order to operate could be represented by A- 
vertices, whereas processors which can choose one of their “redundant” 
inputs could be represented by V-vertices. In this paper, using the results 
for hardness of approximation and optimization problems, we will design 
dependable computation systems which could defeat as many malicious 
faults as possible. Specifically, assuming certain approximation hardness 
result, we will construct fe-connected AND/OR graphs which could de- 
feat a cfe-active adversary (therefore a cfe-passive adversary also) where 
c > 1 is any given constant. This result improves a great deal on the 
results for the equivalent communication problems. 



1 Introduction 

Redundancy has been utilized to achieve reliability, for example to achieve fault 
tolerant computation and to achieve reliable communication in networks of pro- 
cessors. One of the primary objectives of a redundant computation system is 
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to tolerate as many faults (accidental or malicious) as possible. Hence, one of 
the crucial requirements in designing redundant computation systems is to use 
the least resources (redundancy) to achieve dependable computation against the 
most powerful adversaries. It has been proven (see, e.g., Hadzilacos [15], Dolev 
[10], Dolev, Dwork, Waarts, and Yung [11], and Beimel and Franklin [4]) that 
in the presence of a fc-passive adversary (respectively fe-active adversary) the 
processors in a network can communicate reliably if and only if the network is 
k + 1-connected (respectively 2k + 1-connected). 



All these works mentioned above assume processors with one type of input, 
while in practice it is often the case that processors need more than one type of 
inputs. For example, for the national traffic control system, we need data from the 
aviation, rail, highway, and aquatic vehicles, conduits, and support systems by 
which people and goods are moved from a point-of-origin to a destination point in 
order to support and complete matters of commerce, government operations, and 
personal affairs. In addition, each component of the traffic control system is again 
a system consisting of computations with multiple inputs, e.g., the processors of 
the aviation control system need data from several sources such as the airplane’s 
speed, current position, etc., to determine the airplane’s next position. Wang, 
Desmedt, and Burmester [22] have used AND/OR graphs to model this scenario. 
Originally AND / OR graphs have been used in the context of artificial intelligence 
to model problem solving processes (see [17]). Roughly speaking, an AND/OR 
graph is a directed graph with two types of vertices, labeled A-vertices and V- 
vertices. The graph must have at least one input (source) vertex and one output 
(sink) vertex. In this case, processors which need all their inputs in order to 
operate could be represented by A-vertices, whereas processors which can choose 
(using some kind of voting procedure) one of their “redundant” inputs could be 
represented by V- vertices. A solution graph, which describes a valid computation 
of the system, is a minimal subgraph of an AND / OR graph with the following 
properties: If an A- vertex is in the solution graph then all of its incoming edges 
(and incident vertices) belong to the solution graph; If an V-vertex is in the 
solution graph then exactly one of its incoming edges (and the incident vertex) 
belongs to the solution graph. Wang, Desmedt, and Burmester [22] showed that 
it is NP-hard to find vertex disjoint solution graphs in an AND/OR graph 
(though there is a polynomial time algorithm for finding vertex disjoint patfis 
in networks of processors witfi one type of inputs). This result shows that in 
order to achieve dependable computation, the computation systems (networks 
of processors) must be designed in such a way that it is easy for the honest 
stations/agents to find the redundant information in the systems. A similar 
analysis as for the case of networks of processors with one type of inputs shows 
that in the presence of a A:-passive adversary (respectively fc-active adversary) 
the computation system modeled by an AND/OR graph is dependable if and 
only if the underlying graph (that is, the AND/OR graph) is fc -f 1-connected 
(respectively 2fe-|- 1-connected) and both the input vertices and the output vertex 
know the set of vertex disjoint solution graphs in the AND/OR graph. Later in 
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this paper, we will use Gav to denote AND / OR graphs and G to denote standard 
undirected graphs unless specified otherwise. 

What happens if we want to tolerate more powerful adversaries? Adding more 
channel is costly, so we suggest a simpler solution: designing the AND/OR graph 
in such a way that it is hard for the adversary to find a vertex separator of the 
maximum set of vertex disjoint solution graphs (that is, find at least one vertex 
on each solution graph in the maximum set of vertex disjoint solution graphs), 
whence the adversary does not know which processors to block (or control). In 
order to achieve this purpose, we need some stronger results for approximation 
and optimization problems. There have been many results (see, e.g., [1,21] for a 
survey) for hardness of approximating an NP-hard optimization problem within 
a factor c from “below” . For example, it is hard to compute an independent set^ 

V of a graph G{V, E) (note that here G is a graph in the standard sense instead 
of being an AND/OR graph) with the property that \V'\ > | for some given 
factor c, where k is the size of the maximum independent set of G. But for our 
problem, we are more concerned with approximating an NP-hard optimization 
problem from “above” . For example, given a graph G{V, E ) , how hard is it to 
compute a vertex set V of G with \V'\ < ck such that V contains an optimal 
independent set of G, where k is the size of the optimal independent set of G? 
We show that this kind of approximation problem is also NP-hard. Then we 
will nse this resnlt to design dependable compntation systems such that with k 
redundant compntation paths we can achieve dependable computation against a 
cfc-active (Byzantine style) adversary (therefore against a cfc-passive adversary 
also), where c > 1 is any given constant. This result improves a great deal on the 
equivalent communication problems (see our discussion on related works below). 

The organization of this paper is as follows. We first prove in Section 2 the 
following result: For any given constant c > 1, it is NP-hard to compute a 
vertex set V of a given graph G{V, E) with the properties that \V'\ < ck and 

V contains an optimal independent set of G(y, E ) , where k is the size of the 
optimal independent set of G(V, E) . Section 3 surveys a model for fault tolerant 
computation and describes the general threats to dependable computation sys- 
tems. In Section 4 we demonstrate how to use AND/OR graphs with trap-doors 
to achieve dependable computation against passive (and active) adversaries. In 
Section 5 we outline an approach to build AND/OR graphs with trap-doors. We 
conclude in Section 6 with remarks towards practical solutions and we present 
some open problems. 



Related work 

Achieving processor cooperation in the presence of faults is a major problem in 
distributed systems. Popular paradigms such as Byzantine agreement have been 
studied extensively. Dolev [10] (see also, Dolev, Dwork, Waarts, and Yung [11]) 
showed that a necessary condition for achieving Byzantine agreement is that the 

^ An independent set in a graph G(V, E) is a subset V' of V such that no two vertices 
in V' are joined by an edge in E. 
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number of faulty processors in the system is less than one-half of the connectivity 
of the system’s network (note that in order to achieve Byzantine agreement, one 
also needs that n > 3k where n is the number of processors in the network 
and k is the number of faulty processors). Hadzilacos [15] has shown that even 
in the absence of malicious failures connectivity fc -f 1 is required to achieve 
agreement in the presence of k faulty processors. Beimel and Franklin [4] have 
shown that if authentication techniques are used, then Byzantine agreement is 
achievable only if the graph of the underlying network is A: -|- 1 connected and the 
union of the authentication graph and the graph of the underlying network is 
2fc -h 1 connected in the presence of k faulty processors. All these works assume 
processors with one type of inputs. Recently, Wang, Desmedt, and Burmester [22] 
have considered the problem of dependable computation with multiple inputs, 
that is, they considered the networks of processors where processors may have 
more than one type of inputs. While there is a polynomial time algorithm for 
finding vertex disjoint paths in networks of processors with one type of inputs, 
Wang, Desmedt, and Burmester’s work shows that the equivalent problem in 
computation with multiple inputs is NP-hard. 

Approximating an NP-hard optimization problem within a factor of 1 -f e 
means to compute solutions whose “cost” is within a multiplicative factor l-fe of 
the cost of the optimal solution. Such solution would suffice in practice, if e were 
close enough to 0. The question of approximability started receiving attention 
soon after NP-completeness was discovered [14,20] (see [14] for a discussion). 
The most successful attempt was due to Papadimitriou and Yannakakis [18], who 
proved that MAX-3SAT (a problem defined by them) is complete for MAX-SNP 
(a complexity class defined by them), in other words, any approximability result 
for MAX-3SAT transfers automatically to a host of other problems. Among other 
results, they have shown that there is a constant £ > 0 such that it is NP-hard 
to compute a size independent set of a given graph G, where k is the size 
of the maximum independent set of G. The results in [18] have been improved 
by many other authors, especially, after the emergence of the PCP theorem [2, 
3], that is, PCP{logn,l) = NP (for a survey, see, e.g., [1,21]). For example, 
Arora, Lund, Motwani, Sudan, and Szegedy have shown that it is NP-hard to 
n'^-approximate an independent set for some (5 > 0. However, all these results are 
related to approximating the independent set from “below”, that is, to compute 
an independence set whose size is smaller than the optimal independent set. 
We will show that it is easy to convert these results to the results of hardness of 
approximating an independent set from “above” instead of from “below” as done 
in [1, 21], that is, it is hard to delete some vertices from a given graph such that 
the resulting graph contains an optimal independent set of the original graph. 



2 Optimization and approximation 

In this section we present some graph theoretic results which will be used in later 
sections. First we remind the reader of the graphs defined in the transformation 
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from 3SAT to Vertex Cover^ in Garey and Johnson [14, pp. 54-56] and we give 
such kind of graphs a special name. 

Definition 1. Letn andm be two positive integers. A graph G{V, E) is called an 
n + m-SAT-graph if there aren + m subgraphs Li{Vli, El^), ■ ■ 

Ti(Vt, ,Eti), ■ Tm{VT„,ET„) of G with the following propeHies: 

1 . V=(UtiVn,)U(U-iVTj. 

2. For each i < n, |Vt;| = 2 and El- consists of the one edge conneeting the 
two vertiees in VLi ■ 

3. For each i <m,Ti is a triangle, which is isomorphic to the undirected graph 
T^{Vt,Et) where Vt = {vi,V 2 , Vs} and Et = {(,Vi,V 2 ),{v 2 ,V 3 ),{v 3 ,Vi)}. 

4- There is a funetion f : {U'fLiVTi) — t (U-hiVtJ sueh that the edge set of G is 
E = ifJUEL,) U U {(n,/(n)) : v 6 U™ iVtJ- 

The following results are straightforward from the definitions. 

Lemma 1. Given ann + m-SAT-graph G(V, E), the following conditions hold. 

1. The size of an independent set of G is at most n + m. 

2. The size of a vertex cover of G is at least n + 2m. 

The following result is proved in [14, pp. 54-56]. 

Lemma 2. (see [If]) Given a 3SAT formula C with n variables and m elauses, 
there is an n + m-SAT-graph G(V, E) with the following properties: 

1. G is satisfiable if and only if there is an independent set of size n -\- m in 
G(V,E). 

2. G is satisfiable if and only if there is vertex cover of size n + 2m in G{V, E) . 

Corollary 1. It is NP-hard to decide whether there is an independent set of 
size n-\-m in an n + m-SAT-graph. 

In addition to the problem of deciding whether an n + m-SAT-graph has an 
independent set of size n -\- m, we are also interested in the following approxi- 
mation problem: for some constant £ > 0 and each n -I- m-SAT-graph G, can we 
compute in polynomial time an independent set of size k/{l-\-e) in G, where k is 
the size of the maximum independent set of G? Papadimitriou and Yannakakis 
[18] (see also, [13,2]) have proved the following result (note that their original 
result is for general graphs though their proof is for n A m-SAT-graphs) . 

Definition 2. For a rational number e > 0, an algorithm is said to compute 
(1 + e)-approximation to the maximum independent set if given any graph G its 
output is an independent set of G with size at least k/{l -|-e) where k is the size 
of the maximum independent set of G. 

^ A vertex cover of a graph G(V, E) is a subset V' of V such that every edge in E is 
incident to a vertex in V' . 
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Theorem 1. (see [18, 13]) There is a constant £ > 0 such that approximating 
an independent set of an n + m-SAT-graph G(V,E) within a factor 1 + s is 
NP-hard. 

Arora, Lund, Motwani, Sudan, and Szegedy [2] have proved the following 
stronger result. 

Theorem 2. (see [2]) There is a constant 5 > 0 such that approximating an 
independent set of a graph G within a factor is NP-hard, where n is the 
number of vertices in G. 

Note that Theorem 2 is only for general graphs. The following variants of 
Theorems 1 and 2 are useful for our discussions. 

Theorem 3. (see [18, 13[) There is a constant e > 0 and a polynomial time 
algorithm to construct for each 3SAT clause C an n + m-SAT-graph G with the 
following properties: 

1. If C is satisfiable, then G has an independent set of size n-\-m. 

2. If C is not satisfiable, then k < where k is the size of the maximum 

independent set in G. 

Proof. It follows from the proof of Theorem 1. □ 

Theorem 4. (see [2, 5[) There is a constant 5 > 0 and a series of pairs of 
positive integers (si, Ci), (s2, C2), • ■ • such that for large enough n and 

from each 3SAT clause C we can construct in a polynomial time a graph G with 
the following properties: 

1. If C is satisfiable, then k > Cn, where k is the size of the maximum indepen- 
dent set in G and n is the number of vertices in G. 

2. If C is not satisfiable, then k < Sn, where k is the size of the maximum 
independent set in G and n is the number of vertices in G. 

Proof. It follows from the proof of Theorem 2. □ 

Given a graph G{V, E), an edge set E' C E is said to be independence eligible 
if there is an independent set V' = {u: there is a n 6 P such that the unordered 
pair (u,v) 6 E'} of size \E'\ in G. Note that given an independence eligible 
edge set E', it is easy to compute an independent set of size \E'\ (by a standard 
algorithm of computing a satisfying assignment of a 2SAT formula). 

Theorem 5. Lete he the constant in Theorem 3. Then it is lAP-hard to compute 
an edge set E' of a given n + m-SAT-graph G with the following properties: 

1. \E'\ < (1 e)k, where k is the size of a maximum independent set of G. 

2. E' contains an independence eligible edge set E" such that \E''\ = k. 

Proof. It follows from Theorem 3. □ 
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Theorem 6. There is a eonstant £ > 0 such that it is NP-ftard to compute an 
edge set E' C E of a graph G(V, E), with the following properties: 

1. \E'\ < kn^^, where k is the size of the maximum independent set of G and 
n=\V\. 

2. E' contains an independence eligible edge set E" such that \E"\ > 

Proof. Let s„,c„ and d be the constants in Theorem 4. And let e = |. We 
reduce the NP-complete problem 3SAT to the problem of this Theorem. For 
each 3SAT formula C, construct a graph G(y, E) satisfying the conditions of 
Theorem 4. Let E' be an edge set satisfying the conditions of the Theorem. 
Then it suffices to show that if \E'\> ^ then k > Cn (therefore C is satisfiable) 
else k < Sn (therefore C is not satisfiable). If \E'\ > ^ then, by the condition 
that \E'\ < kn^, we have ^ < fcn®. That is, 



k > 



2n“^ 



> ^ > S 



n' 



S ^ °n- 



Whence k > c„. Otherwise \E'\ < and 



\ < \E"\ < \E'\ < 



That is, k < Cn- Whence k < Sn- 



□ 



Corollary 2. There is a constant e > 0 such that it is NP-hard to compute a 
vertex set V CV of a graph G(V, E) with the following properties: 

1. \V'\ < kn“^ , where k is the size of the maximum independent set of G and 
n=\V\. 

2. V contains an independent set V" of G(V,E) such that \V"\ > |. 



3 General threats and models for dependable 
computations 

General threats A simple attack to defend against is of a restricted adversary 
(called passive adversary) who is allowed only to monitor communication chan- 
nels and to jam (denial of service) several processors in the computation system, 
but is not allowed to infiltrate/monitor the internal contents of any processor 
of the computation system. Of course, a more realistic adversary is the active 
adversary (Byzantine faults) that can monitor all communication between pro- 
cessors and which in addition is also trying to infiltrate the internal contents of 
several processors. 

A passive adversary with the power of jamming up to k processors is called a 
k-passive adversary. An active adversary (Byzantine faults) may mount a more 
sophisticated attack, where he manages to comprise the security of several inter- 
nal processors of the system, whereby he is now not only capable of monitoring 
the external traffic pattern and capable of jamming several processors but is also 
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capable of examining and modifying every message and data (that is, creating 
bogus messages and data) which passes through (or stored at) these infiltrated 
processors. Thus, we define a k-active adversary, an adversary that can monitor 
all the communication lines between processors and also manages to examine 
and to modify the internal contents of up to k processors of the system. (Similar 
definitions were considered in the literature, see, for example [7,8,12,19] and 
references therein). 

Achieving processor cooperation in the presence of faults is a major prob- 
lem in distributed systems, and has been studied extensively (see, e.g, [4, 10, 
11,15]). All these works assume processors with one type of inputs. Recently, 
Wang, Desmedt, and Burmester [22] have considered the problem of dependable 
computation with multiple inputs, that is, they considered the networks of pro- 
cessors where processors may have more than one type of inputs. While there is 
a polynomial time algorithm for finding vertex disjoint paths in networks of pro- 
cessors with one type of inputs, Wang, Desmedt, and Burmester’s work shows 
that the equivalent problem in computation with multiple inputs is NP-hard. 
In this paper, we will consider redundant computation systems with multiple 
inputs which can be modeled by AND/OR graphs which we now briefly survey. 

Definition 3. (see [22]) An AND/OR graph G a\/(Va, V\/,INPUT, output] E) 
is a directed graph with a set Va of A-vertices, a set Vy of V-vertices, a set 
INPUT o/ input vertices, an output vertex output e Vv; <ind a set o/ directed 
edges E. The vertices without incoming edge are input vertices and the vertex 
without outgoing edge is the output vertex. 

It should be noted that the above definition of AND/OR graphs is different 
from the standard definition in artihcial intelligence (see, e.g., [17]), in that the 
directions of the edges are opposite. The reason is that we want to use the 
AND/OR graphs to model redundant computation systems. 

Assume that we use the AND/OR graph to model a fault tolerant compu- 
tation. So, information (for example, mobile codes) must flow from the input 
vertices to the output vertex. And a valid computation in an AND/OR graph 
can be described by a solution graph (the exact definition will be given below). 
However, if insider vertices may be faulty or even malicious, then the output ver- 
tex cannot trust that the result is authentic or correct. Firstly we assume that 
there is only one A:-passive adversary at any specihc time. The theory of fault 
tolerant computation (see, Hadzilacos [15]), trivially adapted to the AND/OR 
graph model, tells us that if there are A: -I- 1 vertex disjoint paths (solution graphs) 
of information flow in the AND/OR graph then the vertex output will always 
succeed in getting at least one copy of the results. Secondly we assume that there 
is one A:-active adversary at any specific time. Then the theory of fault tolerant 
computation (see, e.g., Dolev [10], Dolev et al. [11], and Beimel and Franklin [4]) 
tells us that if there are 2fc-|- 1 vertex disjoint paths (solution graphs) of informa- 
tion flow in the AND / OR graph then the vertex output will always succeed in 
getting at least k-\-l identical results computed from the input vertices through 
vertex disjoint solution graphs, if output knows the layout of the graph. This 
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implies that if output knows the layout of the graph then it can use a majority 
vote to decide whether the result is correct or not. It follows that in order to 
achieve dependable computation with redundancy, it is necessary to find a set 
of vertex disjoint solution graphs in a given AND / OR graph. 

Definition 4. (see [22]) Let G/w/{V/\,V\/, INPUT, output; E) be an AND/OR 
graph. A solution graph P — (Vp,Ep) is a minimum subgraph ofG^y satisfying 
the following conditions. 

1. output e Vp. 

2. For each A-vertex v £ Vp, all incoming edges of v in E belong to Ep. 

3. For each V-vertex v £ Vp, there is exactly one incoming edge of v in Ep. 

4 . There is a sequence of vertices Ui, . . . , £ Vp such that V\ £ INPUT, = 

output, and (uj— £ Ep for each i < n. 

Moreover, two solution graphs Pi and P 2 are vertex disjoint if {Vp^ Pi Vp„^) C 
(INPUT Li {output}). An AND/OR graph is called fc-connected if the following 
conditions are satisfied. 

1. There are k vertex disjoint solution graphs in Gav- 

2. There do not exist A: + 1 vertex disjiont solution graphs in Gav- 

In order for an adversary to attack the computation system, s/he does not 
need to find all vertex disjoint solution graphs in an AND/OR graph. For a 
passive adversary, s/he can choose to jam one vertex on each solution graph to 
corrupt the system. An active adversary needs to hnd one half of the vertices of 
a vertex separator (defined in the following). 

Definition 5. Let Gav be a k-connected AND/OR graph, and P = {Pi, . . . ,Pk} 
be a maximum set of vertex disjoint solution graphs in Gav ■ A set S = {vi, ... ,Vk} 
of vertices in Gav is called a vertex separator of P if for each solution graph 
PiGP (i = l,...,k), Vi £ Vp, . 

Remark: The problem of finding a vertex separator in an AND / OR graph 
is NP-hard which will be proved in Section 5. 

The question we are addressing in this paper is how to design AND/OR 
graphs with less vertex disjoint solution graphs to achieve dependable computa- 
tion against more powerful passive or active adversaries. 

4 Dependable computation with trap-doors 

In this section, we show how to design dependable computation systems with 
trap-doors such that the following condition is satisfied: 

- The computation system modeled by a fc-connected AND/OR graph is ro- 
bust against a fc'-active adversary (therefore robust against a fe'-passive ad- 
versary also) where k' < ck and c > 1 is any given constant. 
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The idea is to use the fact that it is NP-hard to approximate a vertex 
separator of an AND/OR graph from “above” (see Section 2 for details about 
approximating an NP-hard optimization problem from “above”). It follows that 
if one designs the AND / OR graph in such a way that the trusted participants can 
easily find vertex disjoint solution graphs in it (using some trap-doors), and the 
input vertices always initiate a computation through all solution graphs in the 
maximum set of vertex disjoint solution graphs, then dependable computation 
is possible. The benefit from using trap-doors in a computation system with 
multiple inputs is obvious. If we do not use trap-doors then, by extending the 
conventional fault tolerant computation theory (see, e.g., [4,7,10,11,15]), a k- 
connected AND / OR graph is only robust against fc'-passive adversaries and only 
robust against fc"-active adversaries respectively, when k' < k and k" < | . Since 
if the adversary has the power to jam k vertices in the AND/OR graph and s/he 
can find a vertex separator of size k, then s/he can jam all of the vertices in 
the vertex separator and corrupt the system. Indeed, if the adversary has the 
power to examine and modify messages and data in [|J + 1 processors, then the 
adversary may let the [|J -|-I faulty processors create and send faulty messages to 
the output processor claiming that they come from some bogus solution graphs. 
This will convince the output vertex to accept the bogus message since the 
majority messages are faulty. However, if we use trap-doors in the design of 
AND/OR graphs, then with high probability, a fe-connected AND/OR graph 
is robust against fc'-active adversaries (therefore against fc'-passive adversaries) 
where k' < ck and c > 1 is any given constant. The reason is that even though the 
adversary has the power to jam or control k' > k vertices in the AND /OR graph, 
he does not know which vertices to corrupt, that is, the corrupted vertices (in 
his control) will appear on at least half of the k vertex disjoint solution graphs. 

So one of the main problems is to design AND/OR graphs in which it is 
hard on the average case to approximate at least one half of a vertex separator 
from “above”. In Section 5, we will outline an approach to generate such kind 
of AND/OR graphs. In the remaining part of this section we will demonstrate 
how to use these AND / OR graphs to achieve dependability. 



Protocol I 

1. Alice generates a fc-connected AND/OR graph Gav such that the graph Gav 
can implement the desired computation and such that finding a ck size set of 
vertices which contains at least one half of the elements of a vertex separator 
is hard, where c > 1 is any given constant. (The details will be presented in 
Section 5). 

2. Using a secure channel, Alice sends the input vertices the method of initiating 
a computation and sends the output vertex a maximum set of vertex disjoint 
solution graphs in Gav- 

3. In order to carry out one computation, Alice initiates the computation 
through all solution graphs in the maximum set of vertex disjoint solution 
graphs. 
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4. When the output vertex gets all possible outputs, he compares the results 
from the k vertex disjoint solution graphs (note that the output vertex knows 
the maximum set of vertex disjoint solution graphs) and chooses the authen- 
tic result using a majority vote. 

Note that our above protocol is not secure against a dynamic adversary who 
after observing one computation will change the vertices he controls. Indeed, 
it is an interesting open problem to design protocols which are secure against 
dynamic adversaries. 

Now assume that Mallory is a fc'-active adversary (or a fc'-passive adversary) 
where k' < ck for the constant c > 1, and P = {Pi , . . . , P* } is a maximum set of 
vertex disjoint solution graphs in the AND/OR graph used in Protocol I. Since 
Mallory does not know how to find a k' size set of vertices which contains at least 
one half of the elements of a vertex separator for P (finding such a set is very 
hard), she does not know which vertices to corrupt so that she can generate at 
least LfJ + 1 bogus messages to convince the output vertex to accept (or so that 
all these k solution graphs will be jammed), even though she has the power to 
corrupt k' = ck vertices. It follows that the system is robust against a fc'-active 
adversary (therefore robust against a fc'-passive adversary also) where k' < ck. 

5 AND/OR graphs with trap-doors 

In this section, we outline an approach for constructing AND/OR graphs with 
trap-doors. We first show that it is NP-hard to approximate at least half of the 
elements of a vertex separator of an AND/OR graph from “above”. 

Theorem 7. Given an AND/OR graph G AviVA,V\/, IN PUT, output] E), it is 
NP-/ianrf to compute a vertex set S' C (V/ U W) with the following properties: 

1. IfG/\\y is k-connected then |S"| < ck. 

2. For some vertex separator S of G/\w, IP H S'| > |. 

Proof. We reduce the problem of Theorem 6 to the problem of this Theorem. 

For a given graph G'{V , E'), we construct an AND/OR graph G'f^{V/, Vf, 
INPUT" , output"] E") as follows. Assume that V = {v\, . . . , n„}. Let INPUT" 
{li, lij n}, VfJ = [output], V/ = [mj \ i,j - 1, . . .n]0 [ui : i = 

1, . . . , n}, and E" be the set of the following edges. 

1. For each i = 1, . . . ,n, there is an edge R^Ui. 

2. For each pair i,j — 1, . . . , n, there is an edge -^Uij. 

3. For each pair i,j = l,...,n, such that {vi,Vj) e E' , there are four edges 
Uij^Ui, Uij^Uj, Uj^i^Ui, and uj^i^Uj. 

4. For each i, there is an edge Ui^ output" . 

It is clear that two solution graphs Pi and P 2 in G'/y which go through ui 
and Uj respectively are vertex disjoint if and only if there is no edge {vi,Vj) in 
E'. Hence there is a size k independent set in G' if and only if there are k vertex 
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disjoint solution graphs in G”v- from k vertex disjoint solution graphs in 
one can compute in linear time a size k independent set in G' . Whence it 
is sufficient to show that from each vertex set S' satisfying the conditions of the 
Theorem, one can compute in polynomial time an edge set Es> C E' with the 
following properties: 

1. If Gav is A:-connected (that is, if the optimal independent set in G' has size 
k) then |£ls/| < ck. 

2. Es' contains an independence eligible edge set of size at least 

The following algorithm will output an edge set Es' with the above proper- 
ties. In the following S' is the vertex set satisfying the conditions of the Theorem. 

— Let Es' = 0. For i = 1, . . . ,ck, we distinguish the following two cases: 

1. Si = Uj for some j < n. Let Es' = Es> U {("fj, fj)} where u' is any vertex 
in G' which is incident to Vj. 

2. Si = for some ji , j 2 < «-• Let Es' = Es’ U {{vj^ , vj^)} if , vj^) E 
E' and Es' = Es' otherwise. 

By the property of S", it is clear that Es' has the required properties. 

By Theorem 6, we have completed the proof of the Theorem. □ 

In the remaining part of this section, we outline how to construct AND/OR 
graphs with trap-doors. 



Construction First generate a graph G'{V , E') and a number k which satisfy 
the conditions of Theorem 3 (or Theorem 4). Secondly use the method in the 
proof of Theorem 7 to generate an AND /OR graph G/y with the property that it 
is hard to approximate at least half of the elements of a vertex separator of G'/y 
from “above”. The AND/OR graph Gav is obtained by replacing all vertices 
Uij of G/v with the AND/OR graph Gav; where G\y is the AND/OR graph 
which can implement the desired computation. As a summary, the construction 
proceeds as follows. 

graph G' AND/OR graph G/y AND/OR graph Gav 



6 Towards practical solutions 

In the previous section, we considered the problem of designing AND/OR graphs 
with trap-doors. Specifically, we constructed AND/OR graphs which is robust 
against cfc-active adversaries (therefore robust against cfc-passive adversaries 
also). However, these constructions are inefficient and are only of theoretical 
interests. One of the most interesting open questions is how to efficiently gen- 
erate hard instances of AND/OR graphs, especially, for arbitrary number k. If 
we do not require that c be an arbitrary given constant, then Theorem 5 can be 
used to construct AND/OR graphs which are more “efficient” (though still have 
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enormous complexity) than the AND/OR graphs constructed in the previous 
section and which are robust against (1 + e)fc-passive adversaries where £ < 1 
is a small positive rational number. However, in order to construct AND/OR 
graphs which are robust against cfc-active adversaries for c > ^ , we have to use 
Theorem 6 in our construction. And the size of the graph G in Theorem 6 will 
be impractical if we want to make the security of the system to be at least as 
hard as an exhaustive search of a 1024-bit space. 

We should also note that, in order to construct the AND/OR graphs in 
the previous section, we need to construct standard graphs which satisfy the 
conditions of Theorem 3 (or Theorem 4) . That is, we need an algorithm to build 
graphs whose independent sets are hard to approximate in the average case (note 
that Theorem 7 only guarantees the worst-case hardness instead of average- 
case hardness). Whence it is interesting (and open) to prove some average-case 
hardness results for the corresponding problems. 

In the following, we consider the problem of constructing practical average- 
case hard AND / OR graphs which are robust against k + c-passive adversaries, 
where c is some given constant. Our following construction is based on the hard- 
ness of factoring a large integer and we will not use the approximation hardness 
results. 

Construction Let N be a large number which is a product of two primes p 
and q. We will construct an AND / OR graph Gav with the following property: 
given the number N and a vertex separator for Gav, one can compute efficiently 
the two factors p and q. Let xi,. . . ,xt and yi,. . . ,yt be variables which take 
values 0 and 1, where t = [log A’’J. And let {xt ■ ■ ■Xi )2 and {yt ■ ■ ■yi )2 to denote 
the binary representations of and respectively. Then use the 

relation 

ixt...xi)2x(yt---yi)2=N (1) 

to construct a 3SAT formula C with the following properties: 

1. C has at most O(t^) clauses. 

2. C is satisfiable and, from a satisfying assignment of G, one can compute in 
linear time a assignment of xi, . . . ,xt,yi, ■ ■ ■ ,yt such that the equation (1) is 
satisfied. That is, from a satisfying assignment of G, one can factor N easily. 

Now use Lemma 2 to construct an n -I- m-SAT-graph G'{V, E') and a number 
k = 0{t^) with the property that: from a size k independent set of G' one can 
compute in linear time a satisfying assignment of C. Lastly, use the method in 
the proof of Theorem 7 to generate an AND / OR graph Gav with the property 
that, from a vertex separator of GaV) one can compute in linear time a size k 
independent set of G' (note that, instead of approximating a vertex separator, 
here we need to know a whole set of vertex separator) . As in the proof of Theorem 
7, from a vertex separator of Gav one can easily compute a size k independence 
eligible edge set of G', from which one can compute in linear time a size k 
independent set of G' (using the method of computing a satisfying assignment 
of a 2SAT formula). 




Using Approximation Hardness to Achieve Dependable Computation 



185 



It is straightforward to see that the above constructed AND / OR graph Gav 
is robust against k + c-passive adversaries if factoring A'' is hard, where c is any 
given constant. 
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Abstract. We formally define a class of sequential pattern matching al- 
gorithms that includes all variations of Morris- Pratt algorithm. For the 
last twenty years it was known that the complexity of such algorithms 
is bounded by a linear function of the text length. Recently, substantial 
progress has been made in identifying lower bounds. We now prove there 
exists asymptotically a linearity constant for the worst and the average 
cases. We use Subadditive Ergodic Theorem and prove an almost sure 
convergence. Our results hold for any given pattern and text and for sta- 
tionary ergodic pattern and text. In the course of the proof, we establish 
some structural property, namely, the existence of “unavoidable posi- 
tions” where the algorithm must stop to compare. This property seems 
to be uniquely reserved for Morris-Pratt type algorithms (e.g., Boyer and 
Moore algorithm does not possess this property). 



1 Introduction 

The complexity of string searching algorithms has been discussed in various pa- 
pers (cf. [1, 6, 7, 8, 9, 12, 18]). It is well known that most pattern matching 
algorithms perform linearly in the worst case as well as “on average”. Several 
attempts have been made to provide tight bounds on the so-called “linearity 
constant”. Nevertheless, the existence of such a constant has never been proved. 
The only exception known to us is the average case of Morris-Pratt-like algo- 
rithms [18] (cf. [17]) for the symmetric Bernoulli model (independent generation 
of symbols with each symbol occurring with the same probability) where the 
constant was also explicitly computed. 

In this paper we investigate a fairly general class of algorithms, called sequen- 
tial algorithms, for which the existence of the linearity constant (in an asymptotic 
sense) is proved for the worst and the average case. Sequential algorithms in- 
clude the naive one and several variants of Morris-Pratt algorithm [16]. These 
algorithms never go backward, work by comparisons, and are easy to implement. 
They perform better than Boyer-Moore like algorithms in numerous cases, e.g., 
for binary alphabet [2], when character distributions are strongly biased, and 
when the pattern and text distributions are correlated. Thus, even from a prac- 
tical point of view these algorithms are worth studying. 
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In this paper we analyze sequential algorithms under a general probabilistic 
model that only assumes stationarity and ergodidty of the text and pattern 
sequences. We show that asymptotic complexity grows linearly with the text 
length for all but finitely many strings (i.e., in almost sure sense). The proof 
relies on the Subadditive Ergodic Theorem [11]. 

The literature on worst case as well average case on Knuth-Morris-Pratt type 
algorithms is rather scanty. For almost twenty years the upper bound was known 
[16] , and no progress has been reported on a lower bound or a tight bound. This 
was partially rectified by Colussi et al. [8] and Cole et al. [7] who established 
several lower bounds for the so called “on-line” sequential algorithms. However, 
the existence of the linearity constant was not established yet, at least for the 
“average complexity” under general probabilistic model assumed in this paper. 
In the course of proving our main result, we construct the so called unavoidable 
positions where the algorithm must stop to compare. The existence of these 
positions is crucial for establishing the subadditivity of the complexity function 
for the Morris-Pratt type algorithms, and hence their linearity. This property 
seems to be restricted to Morris-Pratt type algorithms (e.g., the Boyer- Moore 
algorithm does not possess any unavoidable position). 

The paper is organized as follows. In the next section we present a general 
definition of sequential algorithms, and formulate our main results. Section 3 
contains all proofs. In concluding remarks we apply Azuma’s inequality to show 
that the complexity is well concentrated around its most likely value (even if the 
value of the linearity constant is still unknown). 

2 Sequential Algorithms 

In this section, we first present a general definition of sequential algorithms (i.e., 
algorithms that work like Morris-Pratt). Then, we formulate our main results 
and discuss some consequences. 



2.1 Basic Definitions 

Throughout we write p and t for the pattern and the text which are of lengths 
m and n, respectively. The ith character of the pattern p (text t) is denoted 
as p[«] (t[«]), and by we define the substring of t starting at position i and 
ending at position j, that is = t[i]t[i -h 1] • • -t[j]. We also assume that for a 
given pattern p its length m does not vary with the text length n. 

Our prime goal is to investigate complexity of string matching algorithms 
that work by comparisons (i.e., the so called comparison model). 

Definition 1. (i) For any string matching algorithm that runs on a given text 
t and a given pattern p, let M{l,k) = 1 if the Ith symbol t[Z] of the text is 
compared by the algorithm to the A:th symbol p[fc] of the pattern; and M(l,k) = 0 
otherwise. We assume in the following that this comparison is performed at most 



once. 
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(ii) For a given pattern matching algorithm partial complexity function Cr,n is 
defined as 

Cr,s(t,p)= ^ M[l,k] (1) 

where 1 < r < s < n. For r = 1 and s = n the function := c„ is simply 
called the complexity of the algorithm. If either the pattern or the text is a 
realization of a random sequence, then we denote the complexity by a capital 
letter, that is, we write instead of c„. 

Our goal is to find an asymptotic expression for c„ and for large n under 
deterministic and stochastic assumptions regarding the strings p and t. (For 
simplicity of notation we often write c„ instead of c„(t, p).) We need some further 
definitions that will lead to a formal description of sequential algorithms. 

We start with a definition of an alignment position. 

Definition 2. Given a string searching algorithm, a text t and a pattern p, a 
position AP in the text t satisfying for some k {1 < k <m) 

M[AP+{k-l),k] = 1 

is said to be an alignment position. 

Intuitively, at some step of the algorithm, an alignment of pattern p at po- 
sition AP is considered, and a comparison is made with character p[fc] of the 
pattern. 

Finally, we are ready to define sequential algorithms. Sequentiality refers to 
a special structure of a sequence of positions that pattern and text visit during 
a string matching algorithm. Throughout, we shall denote these sequences as 
{li,ki) where /* refers to a position visited during the fth comparison by the 
text while ki refers to a position of the pattern when the pattern is aligned at 
position /j — fcj + 1. 

Definition 3. A string searching algorithm is said to be: 

(i) semi-sequential if the text is scanned from left to right; 

(ii) strongly semi-sequential if the order of text-pattern comparisons actu- 
ally performed by the algorithm defines a non- decreasing sequence of text 
positions (/*) and if the sequence of alignment positions is non- decreasing. 

(iii) sequential (respectively strongly sequential) if they satisfy (i) (respec- 
tively (ii)) and if, additionally, for any k > 1 

M[/,fc] = i ^ (2) 

In passing, we point out that condition (i) means that the text is read from 
left to right. Note that our assumptions on non-decreasing text positions in (ii) 
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implies (i). Furthermore, non-decreasing alignment positions implies that all oc- 
currences of the pattern before this alignment position were detected before this 
choice. Nevertheless, these constraints on the sequence of text-pattern compar- 
isons are not enough to prevent the algorithm to “fool around”, and to 

guarantee a general tight bound on the complexity. Although (2) is not a log- 
ical consequence of semi-sequentiality, it represents a natural way of using the 
available information for semi-sequential algorithms. In that case, subpattern 
known when t[/] is read. There is no need to compare p[A;] with t[Z] 

if is not a prefix of p of size k — 1, i.e if AP = l — {k — 1) has already 

been disregarded. 

We now illustrate our definition on several examples. 

Example 1: Naive or brute force algorithm 

The simplest string searching algorithm is the naive one. All text positions are 
alignment positions. For a given one, say AP, text is scanned until the pattern 
is found or a mismatch occurs. Then, AP -|- 1 is chosen as the next alignment 
position and the process is repeated. 

This algorithm is sequential (hence semi-sequential) but not strongly sequen- 
tial. Condition in (ii) is violated after any mismatch on an alignment position 
I with parameter A: > 3 , as comparison (Z -|- 1,1) occurs after (Z -I- 1,2) and 
(Z + 2,3). 

Example 2: Morris- Pratt-like algorithms [16, 19]. 

It was already noted in [16] that after a mismatch occurs when comparing 
t[Z] with p[fc], some alignment positions in [Z-l- 1, . . . , Z -|- A; — 1] can be disregarded 
without further text-pattern comparisons. Namely, the ones that satisfy ^ 

p*“®. Or, equivalently, p*_|_j 7 ^ Pi~®, and the set of such i can be known by a 
preprocessing of p. Other i define the “surviving candidates” , and chosing the 
next alignment position among the surviving candidates is enough to ensure that 
condition (ii) in Definition 3 holds. Different choices lead to different variants of 
the classic Morris-Pratt algorithm [16]. They differ by the use of the information 
obtained from the mismatching position. We formally define three main variants, 
and provide an example. One defines a shift function S to be used after any 
mismatch as: 

Morris-Pratt variant: 

S = min{A: - 1; min{s > 0 : p*+] = ; 



Knuth-Morris-Pratt variant: 



S = mm{k-, min{s : pj=+] = pj= ^ ®and p|[: 7 ^ p*_®}} ; 
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Simon variant: 

K = max{A: : M{1, fc) = 1} ; 

B = {s : = pf^“^“*and 0 < s < K - k} ; 

S = min{d > 0 : and (p^zj 7^ PkZ% s e B)} 

Example 3: Illustration to Definition 3. 

Let p = abacabacabab and t = abacabacabaaa. The first mismatch occurs for 
M(12, 12). The comparisons performed from that point are: 

— Morris-Pratt variant: 

(12, 12); (12, 8); (12, 4); (12, 2); (12, 1); (13, 2); (13, 1) , 

where the text character is compared with pattern characters (b, c, c, b, a, b, a) 
with the alignment positions (1,5,9, 11, 12, 12, 13). 

— Knuth-Morris-Pratt variant: 

(12, 12); (12,8); (12,2); (12, 1); (13, 2); (13, 1) , 

where the text character is compared with pattern characters (6, c, b, a, b, a) 
with the alignment positions (1, 5, 11, 12, 12, 13). 

— Simon variant: 



(12, 12); (12, 8); (12,1); (13, 2); (13,1) , 

where the text character is compared in turn with pattern characters (6, c, a, b, a) 

with the alignment positions (1,5, 12, 12, 13). 

Some observations are in sequel: Morris-Pratt variant considers one align- 
ment position at a time, while the optimal sequential algorithm, that of Simon, 
considers several alignment positions at the same time, and may disregard sev- 
eral of them simultaneously (e.g., in Example 3 positions 1 and 9 at the first 
step and 5 and 11 at the second step). It is interesting to observe that the subset 
{1,5,12} of alignments positions appears in all variants. We will see that they 
share a common property of “unavoidability” explored below. 

Our definition of semi-sequentiality is very close to the definition of sequen- 
tiality given in [13]. We do not use the “on-line” concept of [6]. The on-line 
algorithms are very close to our strongly sequential ones. Also, while condition 
(2) is a natural optimization for semi-sequential algorithms, it seems not to be 
true for other efficient algorithms discussed in [8]. 

Finally, in the course of proving our main result we discover an interesting 
structural property of sequential algorithms that we already observed in Ex. 3. 
Namely, when the algorithm is run on a substring of the text, say t", then there 
are some positions i > r that are unavoidable alignment positions, that is, the 
algorithm must align at this positions at some step (e.g., see positions {1,5, 12} 
in Ex. 3). More formally: 
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Definition 4. For a given pattern p, a position i in the text t” is an unavoid- 
able alignment position for an algorithm if for any r, I such that r < i and 
I >i + m, the position i is an alignment position when the algorithm is rnn on 

t^ 

tj,. 



Having in mind the above definitions we can describe our last class of se- 
quential algorithms (containing all variants of KMP-like algorithms) for which 
we formulate our main results. 

Definitions. An algorithm is said to be ^-convergent if, for any text t and 
pattern p, there exists an increasing sequence of unavoidable alignment 

positions satisfying — Ui < £ where C/q = 0 and n — max* Ui < £. 

In passing we note that the naive pattern matching algorithm (cf. Ex. 1) is 
1-convergent. We prove below that all strongly sequential algorithms (i.e., all 
Morris-Pratt-like algorithms) are m-convergent which will further imply several 
interesting and useful properties of these algorithms (e.g., linear complexity). 



2.2 Main Results 

In this section we formulate our main results. Before, however, we must describe 
modeling assumptions concerning the strings. We adopt one of the following 
assumptions: 

(A) Worst-Case (Deterministic) Model 

Both strings p and t are non random (deterministic) and p is given. 

(B) Semi-Random model 

The text string t is a realization of a stationary and ergodic sequence while 
the pattern string p is given. 

(C) Stationary Model 

Strings t and p are realizations of a stationary and ergodic sequence (cf. 
[3]). (Roughly speaking, a sequence, say t” , is stationary if the probability 
distribution is the same for all substrings of equal sizes, say t)'*'* and tj'*'* 
for 1 < i < j < n.) 

Formulation of our results depends on the model we work with. In the de- 
terministic model we interpret the complexity c„(t,p) as the worst case com- 
plexity (i.e., we maximize the complexity over all texts). Under assumption 
(B) we consider almost sure (a.s.) convergence of Cn- More formally, we write 
Cn/ein (a.s.) where is a deterministic sequence and a is a constant if 

lim„^oo Pr{supj,>„ \Ck/ok — a| > e} = 0 for any £ > 0 (cf. [3]). Finally, in the 
stationary model (C) we use standard average case complexity, that is, ECn- 
Now we are ready to formulate our main results. 
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Theorem 6. Consider an £ < m convergent sequential string matching algo 
rithrn. Let p 6e a given pattern of length m. 

(i) Under assumption (A) the following holds 

maxtc„(t,p) 



lim 

n—^oo 



= cti(p) 



( 3 ) 



where cti(p) > 1 is a constant. 

(ii) Under assumption (B) one finds 



Cnjp) 

n 



«2(P) 



a.s. 



( 4 ) 



where a 2 (p) > 1 is a constant. If Et denotes the the average cost over all text 
strings, the following also holds: 



lim 

n—¥oo 



EtCnil>) 

n 



ct2(p) 



( 5 ) 



Theorem 7. Consider an £-convergent sequential string matching algorithm. 
Under assumption (C) we have 

lim = Q/g (6) 

n-too n 



provided m = o{^/n), where as > 1 is a constant and Et^p denotes the average 
over all text strings of size n and patterns of size m. 



Finally, with respect to our main class of algorithms, namely, Morris-Pratt 
like (i.e., sequential) we shall prove in the next section the following results 
concerning the existence of unavoidable positions. 



Theorem 8. Given a pattern p and a text t, all strongly sequential algorithms 
have the same set of unavoidable alignment positions U = where 

Ui = min{^mm^{t| ^ p}, 1+1} (7) 

and p means that the substring is a prefix of the pattern p. 

Theorem 9. Strongly sequential algorithms (e.g., Morris-Pratt like algorithms) 
are m-convergent and (3)-(6) hold. 



In summary, the above says that there exists a constant a such that c„ = 
an + o(n) and/or ECn = an + o{n). All previous results have been able only 
to show that c„ = &(n) but they did not excluded some bounded fluctuation 
of the coefficient at n. We should point out that in the analysis of algorithms 
on words such a fluctuation can occur in some problems involving suffix trees 
(cf. [4, 14, 20]). But, in this paper we prove that such a fluctuation cannot take 
place for the complexity function of the strongly sequential pattern matching 
algorithms. For example, in the worst case we prove here that for any given 
pattern p, any e > 0 and any n > n^, one can find a text t” such that — 
aiip)\ < e. 
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3 Analysis 

In this section we prove Theorems 6-9. The idea of the proof is quite simple. We 
shall show that a function of the complexity (i.e., cj, = c„ + /(m) where /(m) 
is a function of the length m of the pattern p) is subadditive. In the “average 
case analysis” we indicate that under assumption (C) the average complexity C„ 
is a stationary and ergodic sequence. Then, direct application of an extension 
of Kingman’s Subadditive Ergodic Theorem due to Derriennic [10] will do the 
job of proving our results. In passing, we point out that the most challenging is 
establishing the subadditivity property to which most of this section is devoted. 

For the reader’s convenience we start this section with a brief review of the 
subadditive ergodic theorem (cf. [11, 15]. 

Theorem 10. (Subadditive Sequence), (i) Let for a (deterministic) nonnegative 
sequence the following property, called subadditivity, holds 



^m+n S: T ■ (8) 

Then 

lim — = inf = a (9) 

n-s-oo n m>l m 

for some constant a. 

(ii) (Subadditive Ergodic Theorem [15]). Let Xm,n (m < n) be a sequence of 
nonnegative random variables satisfying the following three properties 

(a) Xo,„ < Xo,m +Xm,n (subadditivity) ; 

(b) Xm,n is stationary (i.e., the joint distributions of Xm,n o.re the same as 
Xm+i,n+i) and ergodic (cf [3]); 

(c) EXo,i < 00 . 

Then, 

lim = .y and lim = y (^a.s.) (10) 

n— s-oo n n— >oo n 

for some constant 7. 

(iii) (Almost Subadditive Ergodic Theorem [10]). If the subadditivity inequality 
is replaced by 

^0,n ^ ATq ni T -^m.n T (H) 

such that liiUn^oo EAn/n = 0, then (10) holds, too. 

Thus, to prove our main results we need to establish the subadditivity prop- 
erty for the complexity c„(t,p) (for all texts t and patterns p). The next lemma 
proves such a result for ^-convergent sequential algorithms. 
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Lemma 11. An £-convergent semi-sequential (or strongly semi-sequential) algo- 
rithm satisfies the basic inequality for all r such that 1 <r <n: 

\ci,n - (ci,r + Cr,n)\ < im , (12) 

provided any comparison is done only once. 

Proof. Let Ur be the smallest unavoidable position greater than r. We evaluate 
in turn — {ci^r + CUr,n) and Cr^n ~ CUr,n (cf. Figure 1) . We start our analysis 
by considering Ci,„ — (ci,r + cc/^,n)- This part involves the following contributions: 

— Those comparisons that are performed after position r but with alignment 
positions before r. We call this contribution S\ . Observe that those compar- 
isons contribute to but not to Ci^^- To avoid counting the last character 
r twice, we must subtract one comparison. Thus 

5i= ^ Y,M{ifi-AP + l)-l. 

AP<r i>r 



— This contribution, which we call S2, accounts for alignments AP satisfying 
r < AP < Ur that only contribute to Ci,„, that is, 

c /,,-1 

52 = ^ Y,M{AP+{i-l),i). 

AP=r i<m 



— Finally, since the alignment positions after Ur on the text t(j and t” are 
the same, the only difference in contribution may come from the amount 
of information saved from previous comparisons done on t[. This is clearly 
bound by 

|Cl,n ~ (Cl,r + + 5i -f 52)| < 111 . 

Now, we evaluate — cu^.n (see second part of Figure 1). We assume that 
the algorithm runs on t" and let AP be any alignment position satisfying r < 
AP < Ur- The following contributions must be considered: 

— The contribution S3 



Ur-l 
AP=r i 

counts for the number of comparisons associated positions r < AP < Ur- 
This sum is the same as S 2 but the associated alignment positions and 
searched text positions AP -P A: — 1 may be different. 

— Additional contribution may come from the alignment at position Ur. But, 
no more than m comparisons can be saved from previous comparisons, hence 



|0’,n I 5 m. 
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Fig. 1. Illustration to the proof of Lemma 11 



To complete the proof, it remains to find upper bounds on S 2 , and S 3 . 
For £ > Ur — r we easily see that S 2 and S 3 are smaller than £m. So are their 
difference. With respect to Si, for a given alignment position AP, we have \i — 
AP\ < m. This implies that |r — AP\ < m, and for any AP the index i has at 
most m different values. Thus, Si <rrP, as desired. ■ 

Now we are ready to prove ^-convergence for strongly sequential algorithms, 
i.e. Theorem 9. It relies on Theorem 8 so we present the proof of Theorem 8 
first. Let Z be a text position such that 1 < I < n, and r be any text posi- 
tion satisfying r < Ui. Let {A} be the set of alignment positions defined by a 
strongly sequential algorithm that runs on t”. As it contains r, we may define 
(cf. Figure 2) . 

Aj = max{Aj : A* < Ui}. 

Hence, we have Aj+i > Ui. Using an adversary argument, we shall prove that 
Aj_|_i > Ui cannot be true, thus showing that Aj+i = Ui. Let y = max{A: : 
M{Aj + {k—l),k) = 1}, that is, y is the rightest point we can do a comparison 
starting from Aj We observe that we have y < I ■ Otherwise, according to the 
algorithm rule, we would have A p, which contradicts the definition of 
Ui- Also, since the algorithm is sequential, then Aj+i <yAl<l+l. Hence 
= / -|- 1 contradicts the assumption Aj+i > Ui and we may assume Ui < 1 . 
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In that case, :< p and an occurrence of p at position Ui is consistent with 
the available information. Let the adversary assumes that p does occur. As 
sequence {Ai} is non-decreasing and Aj+i has been chosen greater than Ui, this 
occurrence will not be detected by the algorithm which leads to a contradiction. 
Thus Aj+i = Ui, as desired. This completes the proof of Theorem 8. 



text 



patterns 



s X X 1 1- 

r Aj U/ y I 

I I ? 



I I 

Fig. 2. Illustration to the proof of Theorem 8 



n 



Finally, we turn to the proof of Theorem 9. Let AP be an alignment 
position and define I = AP + m. As |p| = m, one has I — {m — 1) < Ui < 1. 
Hence, Ui — AP < m which establishes the m-convergence. 

We now apply Theorem 10 to prove Theorems 6 and 7. After substituting 
Xi,n = Ci,„ -I- 1.5m^ + im, we get subadditivity for any given p and deterministic 
t. The worst case complexity results follow since 

max Cl „ < max Cm+ max Cr n ■ 

|t|=n ’ |t|='T ’ |t|=n— r ’ 

Now, let t” range over the set of texts of size n, and t” range over the sets 
of texts of size r and n — r. Then, as the text distribution is stationary, the 
subadditivity holds in case (B). Also, the cost Cr,n is stationary when the text 
distribution is. Applying Subadditive Ergodic Theorem yields (4) and (5). 

We turn now to the average complexity. The uniform bound [16] on the 
linearity constant, allows to define Ep{Et{Cr,n)) , when p ranges over a ran- 
dom (possibly infinite) set of patterns. The subadditivity property transfers to 
Et,p{Cn) and (6) follows. This completes the proof of Theorems 6 and 7. 



4 Concluding Remarks 

We consider here sequential algorithms that are variants of classical Morris-Pratt 
algorithms. We provided a formal definition, but the main property we use is 
the existence of the so called unavoidable positions in any window of fixed size 
(here the length of the searched pattern p). Hence, the result extends to any 
algorithm that satisfies such a property [6, 13]. 

Nevertheless, in order to speed up the search, Boyer and Moore introduced 
in [5] a quite different algorithm. Given an alignment position AP, matching 
against p are checked from right to left; i.e. k is decreasing. Several variants 
have been proposed that differ by the amount of information saved to compute 
the next alignment position. 
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We point out here that Boyer-Moore like algorithms do not satisfy unavoid- 
ability property. We provide an example for the Horspool variant: given an align- 
ment position AP, the next alignment position is computed by aligning the text 
character t[AP -\- m] with t[AP -\- j] where 

m - j = min{max{fc : p[fc] = t[AP -I- m]}, m} 

Let us now consider as an example p = x^ax^bx^a, x ^ a,b. When t[AP-\-m] 
is a (resp. b or x) the next alignment position is chosen to be AP+Q (resp. AP+3 
or AP + 1). When t[AP + m] ^ {a,b,x}, one shifts the alignment position by 
m. Assume now that t = y^^az'^{bazbz^)^ with y ^ x and natural n. If the 
Boyer-Moore-Horspool algorithm starts with AP = 1, a mismatch occurs on the 
second comparison between t[10] and p[10] with AP shifted by 6. The same 
event occurs then and we eventually get the sequence APi = 1 -I-6L Assume now 
that we split the text at r = 6. As t[16] is b, one shifts by 3 and b is found again. 
Finally, one gets sequence AP( = 6 -I- 3L As gcd(6, 3) does not divide 5, these 
two sequences are disjoint and there is no unavoidable position. 

It follows that unavoidability cannot be used to prove linearity of Boyer- 
Moore algorithms. Nevertheless, it is clear that we assumed a very strong (and 
unlikely) structure on both text and patterns. In a recent paper [17], the existence 
of renewal points almost surely allowed to prove the existence of a linearity 
constant. 

It is worth noticing that the Subadditive Ergodic Theorem proves the exis- 
tence of the linearity constant under quite general probabilistic assumptions. The 
computation of the constant is difficult and only limited success was achieved so 
far (cf. [13, 18, 17]). However even if we cannot compute the constant, we can 
prove that C„ is well concentrated around its most probably value a- 2 n. Using 
Azuma’s inequality (cf. [21]) we conclude the following. 

Theorem 12. Let the text t be generated by a memoryless source (i.e., t is an 
i.i.d sequence). The number of eomparisons Cn made by the Knuth-Morris-Pratt 
algorithm is concentrated around its mean ECn = ot 2 n{l + o(l)), that is, 

Pr{|C„ -a 2 n| > en} < 2 exp 



for any £ > 0. 
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Abstract. Private information retrieval (PIR) schemes enable users to obtain in- 
formation from databases while keeping their queries secret from the database 
managers. We propose a new model for PIR, utilizing auxiliary random servers 
to provide privacy services for database access. In this model, prior to any on-line 
communication where users request queries , the database engages in an initial pre- 
processing setup stage with the random servers. Using this model we achieve the 
first PIR information theoretic solution in which the database does not need to give 
away its data to be replicated, and with minimal on-line computation cost for the 
database. This solves privacy and efficiency problems inherent to all previous so- 
lutions. 

In particular, all previous information theoretic PIR schemes required multiple 
replications of the database into separate entities which are not allowed to commu- 
nicate with each other; and in all previous schemes (including ones which do not 
achieve information theoretic security), the amount of computation performed by 
the database on-line for every query is at least linear in the size of the database. 
In contrast, in our solutions the database does not give away its contents to any 
other entity; and after the initial setup stage which costs at most 0(wlog n) in 
computation, the database needs to perform only 0(1) amount of computation to 
answer questions of users on-line. All the extra on-line computation is done by the 
auxiliary random servers. 



1 Introduction 

Private Information Retrieval (PIR) schemes provide a user with information from a 
database in a private manner. In this model, the database is viewed as an n-bit string 
X out of which the user retrieves the i-th bit Xi, while giving the database no informa- 
tion about his query i. The notion of PIR was introduced in [10], where it was shown 
that if there is only one copy of the database available then Q(n) bits of communication 
are needed (for information theoretic user privacy). However, if there are k > 2 non- 
communicating copies of the database, then there are solutions with much better (sub- 
linear) communication complexity. Symmetrically Private Information Retrieval (SPIR) 
[11] addresses the database’s privacy as well by adding the requirement that the user, on 

* An earlier version of this work appears as MIT technical report MIT-LCS-TR-7 1 5 . This work 
was done with the support of DARPA grant DABT63-96-C-0018. 
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the other hand, cannot obtain any information about the database in a single query except 
for a single physical value. 

Two major problems arise with all existing solutions: firstly, in order to achieve in- 
formation theoretic privacy all previous solutions call for replicating the database into 
several non-communicating copies, which constitutes a serious privacy problem (as dis- 
cussed below); and secondly, even though the communication complexity is sublinear, 
the amount of computation that the database engages in is linear in the size of the database 
for every query of the user. It seems unreasonable to expect a commercial database to dis- 
tribute copies of its data to non-communicating entities and to perform linear amount of 
computation per single query solely for the purpose of the user’s privacy. 

In this paper, we introduce a new model for PIR (or SPIR), which allows us to achieve 
significant improvements both in terms of security (circumventing the replication prob- 
lem) and in terms of computational complexity. 

The first enhancement to the PIR model is the use of auxiliary random servers, whose 
contents are independent of the contents of the database. This separates the task of in- 
formation retrieval from the task of providing privacy. We use only a single copy of the 
original data (the database owner itself), who does not engage in any complex privacy 
protocols, while all the privacy requirements are achieved utilizing the random servers, 
who do the work instead of the database. The random servers do not gain any informa- 
tion about the database or the user in the process (this is in contrast to the old model, 
where a database who wants to hire an agent to do all the privacy work for it must give 
away all its information to that agent). 

The second enhancement to the model, is that we divide the PIR computation into 
two stages: the setup stage, which takes place ahead of query time and does not involve 
the user, and the on-line stage, during which the user performs his various queries. The 
purpose of this split of computation is to allow much of the computation to be done once 
ahead of time, so that during the on-line stage the database is required to engage in min- 
imal computation and communication. 

Using this model, we construct straightforward and efficient protocols for solving 
the two problems described above. We achieve information theoretic privacy without 
data replication, and we minimize the on-line computation required from the database. 

Below we describe these problems and their solutions in more detail. 

1.1 Problems With The Previous PIR Model 

Protocols for PIR and SPIR schemes, guaranteeing information theoretic privacy, ap- 
peared in [10, 3, 15, 11]. These solutions are based on the idea of using multiple copies of 
the database that are not allowed to communicate with each other. This allows the user to 
ask different questions from different copies of the database and combine their responses 
to get the answer to his query, without revealing his original query to any single database 
(or a coalition). The recent PIR scheme of [14] uses a single database, but guarantees 
only computational privacy under the assumption that distinguishing quadratic residues 
from non-residues modulo composites is intractable. In fact, it was shown in [ 1 0] that us- 
ing a single database makes it impossible to achieve information theoretic privacy with 
sublinear communication complexity. 

Unfortunately, the common paradigm behind all the solutions that guarantee infor- 
mation theoretic privacy — the replication of the database in multiple separated loca- 
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tions — introduces a serious privacy problem to the database, the data replication prob- 
lem. Namely, the database owner is required to distribute its data among multiple foreign 
entities, each of which could be broken into, or could use the data and sell it to users 
behind the legitimate owner’s back. This is particularly problematic since the database 
cannot communicate with any of the other copies. Since this replication is used to pro- 
tect the user’s interest, it is doubtful that real world commercial databases would agree 
to distribute their data to completely separated holders which they cannot communicate 
with. Viewed from the user’s standpoint, it may be doubtful that users interested in pri- 
vacy of their queries would trust copies of the same database not to communicate with 
each other. 

Secondly, the paradigm used in all existing PIR schemes requires the database to 
actively participate in a complex protocol in order to achieve privacy. The protocol is 
complex both in terms of the computation necessary for the database to perform in or- 
der to answer every question of the user, and in the less quantifiable lack of simplicity, 
compared to the classical lookup-the-query-and-answer approach. 

In particular, in all of the existing solutions (whether using a multiple number of 
databases or a single one), each database performs a computation which is at least linear 
in the size of the database in order to compute the necessary answer for each question of 
the user. This is in contrast to the user’s computation and the communication complex- 
ity, which are at most sublinear per query. In the single database case (computational pri- 
vacy) the complexity of the database computation is a function of both the size n of the 
database and the size of the security parameter underlying the cryptographic assumption 
made to ensure privacy. Specifically, in the single database solution of [14], the compu- 
tation takes a linear number of multiplications in a group whose size depends on the 
security parameter chosen for the quadratic residuosity problem.^ Again, the overhead 
in computational complexity and lack of simplicity of existing schemes make it unlikely 
to be embraced as a solution by databases in practice. 

1.2 New Approach: The Random Server Model for PIR 

We introduce a new model for PIR, which allows for information theoretic privacy while 
eliminating the problems discussed above. Since it is not possible to use a single database 
and achieve sublinear communication complexity information theoretic results ([10]), 
we must still use a multiple database model. The crucial difference is that the multiple 
databases are not copies of the original database. Rather, they hold auxiliary random 
strings provided by, say, WWW servers for this purpose. These auxiliary servers contain 
strings each of which cannot be used on its own^ to obtain any information about the 
original data. Thus, an auxiliary server cannot obtain information about the data, sell it 
to others, or use it in any other way. Instead, they may be viewed as servers who are 
selling security services to ordinary current day databases. 

The database owner, after engaging the services of some servers for the purpose of 
offering private and secure access to users, performs an initial setup computation with 

^For example, to achieve communication complexity 0{rf ) the security parameter is of size 
0(n^ ) and the number of multiplications is 0( \n). 

^or in extended solutions, in coalition with others (the number if which is a parameter deter- 
mined by the database) 
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the auxiliary servers. The servers are then ready to assist users in retrieving information 
from the database owner efficiently and privately during the on-line stage. Periodic re- 
initialization (setup stage) may be required in some frequency specified by the protocol. 
Typically this frequency will be once in a large number of queries (e.g. sublinear), or, if 
no periodic re-setup is required, then only when the database needs to be updated.^ 

We differentiate between two kinds of random servers: universal and tailored. Uni- 
versal random servers are servers whose contents may be determined in advance, even 
before the setup stage, without any connection to a particular database. Tailored random 
servers are those who store some random string specific to the database they are serving, 
namely those whose content is determined during the setup stage. 

One of the parameters of a PIR scheme is how many servers of each kind are re- 
quired. Clearly, universal servers are preferable, since they can be prepared in advance 
and therefore are more efficient and more secure. Moreover, since they do not need to 
store any data specific to the database they are serving, they could potentially be used 
for multiple databases at the same time. Indeed, our strongest definition of privacy (total 
independence, below) requires that all servers involved are universal. 

We define two new kinds of privacy for the database in this setting (formal definitions 
are in the next section), independence, and total independence. 

Independence informally means that no server can get any information about the 
original data of the database owner. Thus, the real data is distributed among all the servers 
in a private way, so that no single one gets any information about it (this can be gener- 
alized to f-independence, for any coalition of upto t servers). 

Total independence informally means that even all the auxiliary servers jointly do 
not contain any information about the original data (namely all servers are universal). 

Clearly, total independence implies independence. Indeed the solutions we propose 
to address the latter are simpler than the ones to address the former. 

1.3 Our Results 

We provide general reductions, starting from any PIR scheme, to schemes that achieve 
independence or total independence and low database computation complexity, while 
maintaining the other privacy properties of the underlying starting scheme (namely user 
privacy and database privacy). The database computation complexity on-line is reduced 
to a simple 0(1) look-up-the-query computation, or for some of our schemes to no com- 
putation at all. Instead, the servers assume responsibility for all computations required 
for privacy in the starting scheme. The user computation complexity stays the same as in 
the starting scheme, and (using existing solutions) it is already bounded by the commu- 
nication complexity (sublinear). Therefore, we concentrate on reducing the database’s 
computation, which in all previous schemes has been at least linear. 

Let us describe our results. 

Let S' be a PIR scheme which requires k copies of the database,'^ and has commu- 
nication complexity of Cs- We provide two sets of schemes (all terms used below are 
defined in section 2). 

®Note that also in the old replication model, reinitialization is required when the database 
changes. 

^Note that creating k copies of the database may be viewed as a setup stage of complexity 
0(n). 
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Schemes Achieving Independence We state the result both for the interesting special 
case of t = 1 (i.e. independence), and the most general case for any t. 

- A scheme achieving independence and maintaining the other privacy properties of 
S. The scheme uses k tailored and k universal servers, with communication com- 
plexity 0(Cs), and no database participation in the on-line stage (i.e. no computa- 
tion). 

- A scheme achieving f-independence (for any t > 1) and maintaining the other pri- 
vacy properties of S. The scheme uses k tailored and f fe universal servers, with com- 
munication complexity (i-f l)Cs, and no database participation in the on-line stage. 

Setup stage: The complexity of the setup stage is 0(n). The number of tailored 
servers, who need to obtain some string during setup stage, is k (the same as in the start- 
ing scheme S). 

Schemes Achieving Total Independence There are two variants here. 

- A (basic) scheme achieving total independence and database privacy, and maintain- 
ing user privacy up to equality between repeated queries.® The scheme uses max(fc , 2) 
universal servers and the database owner, with at most 0(Cs log n) communication 
complexity, and 0(1) database computation complexity. 

- A scheme achieving total independence and maintaining the other privacy proper- 
ties of S (in particular complete user privacy). The scheme uses max(A; , 2) universal 
servers and the database owner, with at most 0((m -|- Cs) logn) communication 
complexity, where the servers and the database need to engage in a re-setup after 
every m queries. The database computation is 0(1). 

Setup stage : The complexity of the setup stage is 0(n log n). Note that all servers 
are universal, namely they could be prepared ahead of time, and do not change during 
setup stage. 

Tradeoff between the two versions: In the basic version the database can detect 
repeated queries, but cannot gain any other information about the user’s queries. This is 
dealt with in the final scheme, where total independence is achieved preserving complete 
user privacy. The price for eliminating detection of repeated queries is that re-setup has 
to be performed every m queries. The value of m, the frequency of reinitialization, is a 
parameter chosen to optimally trade off the frequency and the communication complex- 
ity. A suitable choice for existing schemes is to choose m = Cs = the size of the 
communication complexity for a single query, so that the over all communication com- 
plexity does not increase by more than a logarithmic factor, and yet a sublinear number of 
queries can be made before reinitialization. Choosing between the two versions should 
depend on the specific application: preventing the database from detecting equal ques- 
tions, or avoiding reinitialization. Note also that the first version adds database privacy 
even if the underlying S was not database private. 

Main Idea: Note that total independence guarantees that all the auxiliary servers 
jointly do not contain any information about the data. So how can they assist the database 

® namely, the only information that the database can compute is whether this query has been 
made before. 
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at all? The idea is that during the setup stage, a setup protocol is run amongst the database 
and the universal random servers, at the end of which the database is the one which 
changes appropriately to ensure the privacy and correctness properties for the on-line 
stage. During the on-line stage, as before, the user communicates with the random servers 
and with the database to privately extract the answer to his query. 

1.4 Related Work 

PIR was originally introduced by [10], who were only concerned with protecting the 
user’s (information theoretic) privacy. In particular, for a constant number k of database 
copies, [10] with further improvement in [3] (for the case k > 2), achieve information 
theoretic user security with communication complexity of , where n is the length 
of the data (in bits). 

Recently, [11] extended PIR to SPIR {Symmetrically private information retrieval), 
where the privacy of the data (with respect to the user) is considered as well. They use 
a model where the multiple databases may use some shared randomness, to achieve re- 
ductions from PIR to SPIR, paying a multiplicative logarithmic factor in communication 
complexity. 

The work in [9] considers computational privacy for the user, and achieves a 2 database 
scheme with communication complexity of rf for any e > 0, based on the existence 
of one way functions. As mentioned earlier [14] relies on a stronger computational as- 
sumption - the quadratic residuosity problem - to achieve a 1 -database PIR scheme with 
computational privacy and communication complexity of for any e > 0. 

The work in [15] generalizes PIR for private information storage, where a user can 
privately read and write into the database. This model differs from ours, since in our 
model users do not have write access into the database. Still, some connection between 
the two models can be made, since one might consider a storage model where the first 
n operations are restricted to be private write (performed by the database owner), and 
all operations thereafter are restricted to be private reads (by users). This results in a 
model compatible to our model of independence (although this approach cannot lead 
to total independence). We note that [15] independently® use a basic scheme which is 
essentially the same as our basic RDB scheme of section 3.1. However, they use the 
scheme in their modified model (where users have write access), and with a different 
goal in mind, namely that of allowing users to privately read and write into the databases. 

None of the above PIR and SPIR works consider the data replication problem. 

Recently, independently from our work, [7] had suggested the commodity based model 
for cryptographic applications, which relies on servers to provide security, but not to 
be involved in the client computations. Although this model is related to ours we stress 
here some important differences. First, their model engages the servers only in the task 
of sending one message (commodity) to each client, without any interaction. In con- 
trast, our model stresses the interaction of the servers with the clients for the purpose of 
reducing the computational complexity. Second, our model, unlike theirs, is designed 
specifically for PIR, and solves problems which were not previously addressed. 



®our results of section 3 were actually done previously to the publication of [15] 
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Organization Section 2 introduces the relevant definitions and notation used. In sec- 
tion 3 we describe schemes achieving independence, and in section 4 we describe schemes 
achieving total independence. 

2 Notation and Definitions 

The Information Retrieval Model: The data string is a string of n bits denoted by a; = 
xi, . . . , Xn- The user’s query is an index i G {1, . . . , n}, which is the location of the bit 
the user is trying to retrieve. The database (also referred to as the original database, or 
the database owner) is denoted by D. 

An information retrieval scheme is a protocol consisting of a setup stage and an on- 
line stage. During the setup stage, the auxiliary random servers are chosen, and possibly 
some other setup computation is performed. During the on-line stage, a user interacts 
with the servers and possibly also with the original database in order to obtain his query. 

At the end of the interaction, the user should have the bit Xi - In all our schemes, the on- 
line stage consists of a single round. 

The Random Servers: There are two kinds of auxiliary servers: Universal and Tailored. 
The universal servers contain completely random data that can be prepared ahead of time 
independently of the particular database in mind. The tailored servers on the other hand 
are each independent of the database, but their content should be prepared for a partic- 
ular database (during the setup stage), since the combination of all servers together is 
dependent on the specific database. One of the parameters for an information retrieval 
scheme is how many servers of each kind are required. 

We require that all servers are separate, in the sense that they are not allowed to com- 
municate with each other. We also address the case where up to t of the servers are faulty 
and do communicate with each other. 

Notions of Privacy: We define the following privacy properties for an information re- 
trieval scheme. 

User privacy [10]: No single database (or server) can get any information about the 
user’s query i from the on-line stage. That is, all the communication seen by a single 
database is identically distributed for every possible query. This definition can be ex- 
tended to user l-privacy, where all communication seen by any coalition of up to I databases 
(servers) is identically distributed for every possible query. 

Database privacy [11]: The user cannot get any information about the data string other 
than its value in a single location. That is, all the communication seen by the user in the 
on-line stage is dependent on a single physical bit Xi (so it is identically distributed for 
any string x' s.t. x* = x'-). 

Independence: No auxiliary server has any information about the data string x. That is, 
the content of the auxiliary server is identically distributed for any data string x. This 
definition can be extended to t-independence, where no coalition of up to t servers has 
any information about x (thus, independence is the special case of 1 -independence). 

Total independence: All the auxiliary servers jointly have no information about the data 
string X, or equivalently all the servers are universal. That is, they are completely inde- 
pendent of the original data, and thus may all be chosen in advance. 

In all the above definitions, information theoretic privacy may be relaxed to compu- 
tational privacy, requiring indistinguishablity instead of identical distribution. 
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Protocols: A private information retrieval (PIR) scheme is one that achieves user 
privacy, and a symmetrically private information retrieval (SPIR) scheme is one that 
achieves user privacy and database privacy. These two protocols were defined in [10, 
11], respectively. In this paper we will show how to incorporate the independence or 
total independence properties into PIR and SPIR schemes. 

Complexity: We define the communication complexity of an information retrieval 
protocol to be the total number of bits sent between the user, the database, and the servers. 
The computation complexity (of a user/database/server during setup/on-line stage) is the 
amount of computation that needs to be performed by the appropriate party before the 
required communication can be sent. Note that we count sending bits from a specific lo- 
cation towards communication complexity, rather than computation complexity. Com- 
munication and computation complexity during the on-line stage refer to the complexity 
per each (single) query. 

3 Achieving Independence: The RDB Scheme 

In this section we describe a simple and efficient scheme, which takes advantage of the 
random server model to achieve f-independence and no database participation in the on- 
line stage. Specifically, we prove the following theorem. 

Theorem 1. Given any information retrieval scheme S which requires k copies of the 
database and communication complexity Cs, and for every t > 1, there exists an in- 
formation retrieval scheme achieving t-independence and maintaining the other pri- 
vacy properties ( user privacy and data privacy ) ofS. The t-independent scheme requires 
(t -\- 1 )C 5 communication complexity and (t -\- l)k servers, out of which only k are tai- 
lored. The setup complexity is 0(n) and the database is not required to participate in 
the on-line stage. 

An immediate useful corollary follows, setting 7=1: 

Corollary 1. Given any information retrieval scheme S which requires k copies of the 
database, there exists an information retrieval scheme achieving independence and main- 
taining the other privacy properties of S, which requires a factor of 2 in communication 
complexity, and uses k tailored servers and k universal ones. The setup complexity is 
0(n) and the database is not required to participate in the on-line stage. 

The basic version of our reduction (the RDB scheme) is described in section 3 . 1 . In 
section 3.2 we present another version, possessing some appealing extra properties for 
security and simplicity. We note however, that the starting point for the second reduction 
is any information retrieval scheme which has a linear reconstruction function. This is 
usually the case in existing PIR schemes (cf. [10,3]). Finally, in section 3.3 we prove 
that the RDB construction satisfies theorem 1 . 

Another benefit of our scheme is that it does not require the participation of the database 
owner D after the setup stage. Instead, the servers deal with all the on-line queries and 
computations. Even though D is not there for the on-line stage, he is guaranteed that 
none of the servers who are talking to users on his behalf has any information about his 
data X. 
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3.1 The Basic RDB Scheme 

In the basic RDB (random data base) scheme, instead of replicating the original database 
as in the underlying scheme, every copy is replaced by f + 1 random servers whose 
contents xor to the contents of the database. The idea behind this replacement is that 
if these f + 1 databases are chosen uniformly at random with this property, then any 
coalition of t of them are simply a random string, independent of the actual original data 
string. Therefore, f-independence is achieved . We proceed with the details of the basic 
reduction. The communication complexity and privacy properties of this scheme will be 
proved in section 3.3. 

Let the underlying scheme S' be a PIR scheme with k copies of the database. 

Setup Stage The database owner D chooses uniformly at random t+ I random servers 
Ri,..., Rt+i in {0, 1}", such that for every 1 < j < n, Ri(j) © ... © Rt+i(j) = 
D[j) = Xj i.e., the xor of all the servers is the original data string x. This is done by 
choosing k universal servers, and computing the content of another tailored server in an 
appropriate way. A protocol to do that is described in the appendix. 

Each of these servers is then replicated k times, for a total of k(t + 1) servers. 

Thus, at the end of the setup stage, the random servers are 

pi ]Dk pi TDk 

... , . . . , 

where = . . . = i?* for every s, and where © i ?2 ® • • • ® -^t+i = ^ 

every r. 

On-Line Stage During the on-line stage, the user executes the underlying scheme S t-\-l 
times, each time with a different set of k databases. The first execution is with the k 
copies of i?i, which results in the user obtaining Ri(i). The second execution is with 
the k copies of R 2 , resulting in the retrieval of R 2 (i), and so on. Finally, the user xors 
all the f + 1 values he retrieved, i?i (t) ® . . . © Rt+i (i) = D(t) = X{ in order to obtain 
his desired value Xi- 

Note that the user can perform all these f+ 1 executions of S in parallel. Also, the user 
may either perform all these parallel executions independently, or simply use exactly the 
same questions in all of them. Our proofs will cover both these variants, but we prefer 
the latter since it simplifies the protocol of user-privacy against coalitions. However, in 
the most general case, if 5 is a multi round scheme with adaptive questions, we must 
use the first strategy of independent executions. 

Remarks Note that out of the k{t +1) servers, all but k are universal servers which can 
be prepared ahead of time, whereas the other k (copies of are tailored. 

Another thing to note is the fact that our scheme uses replication of the random servers. 
At first glance, this may seem to contradict our goal of solving the data replication prob- 
lem. However, in contrast to replicating the original database, replicating random servers 
does not pose any threat to the original data string which we are trying to protect. Thus, 
we manage to separate the user privacy, which requires replication, from the database 
privacy, which requires not to replicate the data. Still, in the next section we describe 
a version in which there is no replication, not even of the auxiliary servers, and which 
subsequently provides a higher level of privacy, as discussed below. 
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3.2 The RDB Scheme : Improved Variant 

While the basic scheme does achieve t-independence (as no coalition of t servers has 
any information about x), some of the servers there are replications of each other. 

Here, we propose an improvement to the basic scheme, in which a higher level of 
independence among the random servers is achieved, allowing for more flexibility in 
choosing the random servers from different providers. Specifically, we achieve t-independence 
among the servers, namely every combination of t servers are independent of each other 
(in particular, there is no replication of the servers).' Another benefit of this scheme 
over the basic one is that, while t is still the maximal size of coalition that the database 
is secure against, it is also secure against many other specific combinations of larger 
coalitions. This protocol works provided that the underlying PIR scheme has a linear re- 
construction function (see 3.3), a quite general requirement that is satisfied by currently 
known PIR schemes. 

Setup Stage Recall that in the basic version, we created t+l servers and replicated each 
of them k times, thereby creating t-\-l sets, each of which consist of k identical servers. 

In this protocol, the k servers in every set will be independent random strings, instead of 
replications. Specifically, the database owner D chooses uniformly at random k{t + 1) 
servers /Zj+i , . . . ,R\, . . . , with the property that © . . . © = x 

for every 1 < r < k. 

As in the basic scheme, kt of these servers are universal, and k are tailored. The 
contents of the tailored servers is computed by D using the same protocol as in the basic 
scheme (see appendix). 

On-Line Stage During the on-line stage, the user sends his queries (according to the 
underlying S) to each of the servers, where {RLi, , R\^ ^ } correspond to the r-th copy 
of the database in the underlying scheme S. After receiving the answers from all the 
k(t +1) servers, the user xors the answers of . . . , for each r to obtain the 
answer of the r-th copy in S, and combines these answers as in S to obtain his final 
value Xi. 

The difference between this version and the basic version, is the following. In the 
basic scheme, the user first runs S to completion with each of the t -\- 1 sets of servers 
(for example one set is i?|, . . . R\) giving the user f + 1 values that enable him to xor 
them all together and obtain the value of the primary database. In contrast, here the user 
first combines answers by xoring values he received (for example from if} , . . . if}_|_i) in 
the middle of the S protocol, which gives the user the intended answer of each copy of 
the database, and only then combines the answers as needed in S. 

Thus, to succeed in this version, the underlying S must have the following close- 
ness property under xor: If fr(x, q) is the function used by the r-th copy of the database 
in S to answer the user’s query q with the data string x, and given j/i, . . . , ihen 
fr(yi,q)®- ■ ■®fr{ym,q) = ■ -®ym, q)- This may be generalized to any un- 

derlying scheme with a linear reconstruction function. This requirement is very general, 

' Moreover, if we assume that the original data x is randomly distributed, then the servers are 
2t + 1 independent. 
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and is usually satisfied by existing PIR protocols (for example, protocols based on xor- 
ing subsets of locations in the data string, such as [10, 3], the best PIR schemes known 
to date). 

3.3 Analysis of the RDB Scheme: Proof of Theorem 1 

We now analyze the RDB scheme in terms of complexity, correctness, and privacy, to 
show that it satisfies the bounds given in theorem 1. 

The RDB scheme requires a multiplicative factor of (f + 1) in communication com- 
plexity over the underlying scheme S, since S is simply executed t+1 times. Typically, 
f is a constant t > 1, which means the communication complexity of RDB is 0(Cs), 
where Cs is the communication complexity of S. The number of tailored servers re- 
quired is the same as the number of databases required in S, since all the tuples Ri, . . Rt 
can be prepared in advance, and then they can be xored with the original data to produce 
Rt+i- Thus, one tailored server is needed per one copy of the database in the underlying 
5. 

It is not hard to check that the scheme gives the user the correct value Xi, because of 
the way the servers were chosen, and from the correctness of S. 

User privacy properties carry from S, namely if S was user-/-private (i.e. user pri- 
vate against coalitions of up to / databases), then so is the corresponding RDB scheme 
(where user privacy is protected from any coalition of / servers). This is clear for coali- 
tions involving servers from the same set Rl, ... ,R^ for some s, since the user simply 
runs S with the set. This argument immediately extends to arbitrary coalitions if the user 
sends exactly the same questions in all sets (i.e. in every execution of S').® In the case of 
parallel independent executions and a multi round adaptive S, a little more care is needed 
to show that the view of any coalition is independent of i, using the /-user-privacy of S 
inside sets, and the independence of the executions across sets. 

Database privacy of S also implies database privacy of the corresponding RDB scheme, 
as follows. If S is database private (SPIR), then in the r-th parallel execution of S the 
user gets at most one bit, and altogether the user gets at most (f -b I) bits. Since these are 
chosen uniformly at random among all strings that xor to x, it follows that if the (f -|- T) 
bits are from the same location i in all servers, they are distributed uniformly over all 
{t -b l)-tuples that xor to Xi, and otherwise the (f -b 1) bits are distributed randomly 
among all possible tuples. In any case, the user’s view depends on at most one physical 
bit of X, and database privacy is maintained. 

Finally, the RDB scheme achieves f -independence since any coalition of up to t servers 
contains only t or less of the servers in Ri, ... , R^.^.l, and thus (from the way the auxil- 
iary databases were defined), the coalition consists of a string uniformly distributed over 
all strings of appropriate length, independent of x. 

4 Achieving Total Independence: The Oblivious Data Scheme 

In this section we present a scheme for total independence PIR (or total independence 
SPIR), where all auxiliary servers are universal, i.e. jointly independent of the database. 
This scheme also achieves 0(1) computation complexity for the database. 

®This strategy is always possible unless S is a multi round adaptive scheme. 
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Overview of Results We first describe a basic version of our scheme, which achieves 
total independence, as well as database privacy, but maintains the user privacy with one 
exception; in repeated executions of the basic scheme, the database can tell whether the 
questions in different executions correspond to the same index or not. We prove that 
no other information about the content of the queries or the relations among them is re- 
vealed to the database. We call this user privacy up to equality between repeated queries. 
Thus, we prove the following theorem. 

Theorem 2. Given any PIR scheme S which requires k copies of the database and com- 
munication complexity Cs, there exists a total independence SPIR scheme, private for 
the database and private for the user up to equality between repeated queries, which 
uses max(A:, 2) universal servers, and requires communication complexity of at most 
0{Cs log n). The setup complexity is 0(n log n), and the on-line computation complex- 
ity of the database is 0(1). 

The scheme is described in section 4.1, and in section 4.2 we prove that it satisfies 
the theorem. 

Since the information of whether users are asking the same question or not may in 
some applications be an important information that violates the user privacy, we present 
a generalized version of our scheme in section 4.3, which completely hides all informa- 
tion about the user queries, even after multiple executions. This scheme maintains the 
privacy properties of the underlying scheme, namely it transforms a PIR scheme into 
a total independence PIR scheme, and a SPIR scheme into a total independence SPIR 
scheme. The price we pay for eliminating the equality leakage, is that the setup stage 
needs to be repeated every m queries, and an additive factor of m log n is added to the 
communication complexity, where m is a parameter to the scheme (see 4.3 for how to 
choose m). Thus, we prove the following theorem. 

Theorem 3. Given any information retrieval scheme S which requires k copies of the 
database and communication complexity Cg, there exists a total independence infor- 
mation retrieval scheme, maintaining the privacy properties ( user privacy and database 
privacy ) ofS, which uses max(fc , 2) universal servers, and requires communication com- 
plexity of at most 0{(m Cg) log n), where m is the number of queries allowed before 
the system needs to be reinitialized. The setup complexity is 0(n logn), and the on-line 
computation complexity of the database is 0(1). 

The following corollary is obtained by setting m = in the above theorem, where 
is some polynomial equal to the communication complexity of the underlying PIR 
scheme (it is conjectured in [10] that all information theoretic PIR schemes must have 
communication complexity of at least 0(n'^) for some e). 

Corollary 2. Given any information retrieval scheme S which requires k copies of the 
database and has communication complexity 0(n^), there exists a total independence 
information retrieval scheme, maintaining the privacy properties ofS, which uses max(A; , 
universal servers, requires communication complexity ofO(n^ log n), and has to be reini- 
tialized after every 0{rT) number of queries. 

It is not clear which of the two schemes - the one achieving privacy up to equality 
(plus database privacy), or the one achieving full privacy but with periodic setups - is 
better. This depends on the particular needs of the application. 
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The Main Idea: Oblivious Data Recall that in order to achieve information theoretic 
PIR a number of multiple servers is required. On the other hand in order to achieve to- 
tal independence PIR, all auxiliary servers must he (jointly) independent of the data. To 
accommodate these two seemingly conflicting requirements we use the following idea. 
During the setup stage, the database and the auxiliary servers create a new “oblivious” 
string y which depends on the content of all of them. This string must be held by the 
database D (since all others cannot hold any data dependent on x). Thus, we let the 
database change during the setup stage, rather than the servers. Later, during the on-line 
stage, the user interacts with the servers to obtain information about the relation between 
y and x. Knowing this information the user can simply ask D for the value of y in an ap- 
propriate location, whose relation to x he knows from communication with the servers, 
which enables him to compute Xi. We call y an oblivious data string, since it should be 
related to the data string x, yet in a way which is oblivious to its holder D, so that D 
cannot relate the user’s query in y to any query in x, and therefore learns nothing about 
the user’s interests from the user’s query in y. Note that all the database’s work is in the 
setup stage (which amounts to only a logarithmic factor over the work that needs to be 
done to replicate itself in the old model). During the on-line stage, however, all D needs 
to do is to reply with a bit from the oblivious string which requires no computation. 

4.1 Basic Scheme 

Let the underlying scheme S' be a PIR scheme with k copies of the database. 

Setup Stage The (universal) auxiliary servers are k servers each containing a random 
stringr G {0, 1}", and a random permutation tt : [l..n] ^ [L.n] (represented by n log n 
bits in the natural way). D and two of the servers Ri , i?2 engage in a specific multi party 
computation, described below, at the end of which D obtains the oblivious data string 

y = tt{x 0 r) 

but no other information about r, tt. Each server does not obtain any new information 
about X. 

Naturally, by the general multi-party theorems of [6,8], such setup stage protocol 
exist, but are very expensive. Instead, we design a special purpose one-round efficient 
protocol for this purpose. 

The multi party computation is done as follows; D chooses uniformly at random two 
strings x^ and x^ such that 0 = x. Similarly, Ri chooses uniformly at random 

such that 0 = r. i?2 chooses uniformly at random tt^ , tt^ such that tt^ o 

7 T^ = 7T, where o is the composition operator (that is, 7r^(7r^ (•)) = 7 t(-)). The following 
information is then sent between the parties on secure channels: 

R2^ Ri ■ R\ R 2 '■ Ri—rD\v = 7T^(r^ 0 x^) 

D~R\: x^ D ^ R 2 : x"^ R 2 ^ D : pP ,u = 7r{r‘^ (B 

D can now compute y = 7t^(d) © w = 7r(r^ 0 x^) 0 7r(r^ 0 x^) = 7r(r 0 x) (“the 
oblivious string”). Ri and R 2 discard all communication sent to them during the setup 
stage, and need to maintain only their original independent content. 

At the end of the setup stage the database D has two strings: x which is the original 
data string, and also y which is the oblivious data. The auxiliary servers contain the same 
strings as before the setup stage, and do not change or add to their content. 
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On-Line Stage In the on-line stage the user first runs S (the underlying PIR scheme) 
with the servers to retrieve the block {j := 7r(i) , r*), as specified helow (recall that r, 
is the random hit with which the user’s desired data hit was masked, and that j is the 
location of the masked bit in the oblivious data string). Then the user queries D for the 
value at the j-th location yj. This is done by simply sending j to D on the clear, and 
receiving the corresponding bit yj back. To reconstruct his desired bit, the user computes 
yj ®n = [7r(a; 0 r)]j 0 r* = (a; 0 r), 0 r* = Xi. 

Since 5 is a PIR scheme for retrieving a single bit, we need to specify how to retrieve 
the required block. The most naive way is to apply S log n-\-l times, each time retrieving 
a single bit out of the n(log n + 1) bits. However this way does not necessarily maintain 
database privacy, and is not the best in terms of communication complexity. This can 
be improved by noticing that each of the log n 0 1 bits required belongs to a different 
set of n bits. Thus, the online stage can be performed by log n 0 1 parallel applications 
of S for one bit out of n. Further improvements are possible when methods for block 
retrieval which are more efficient than bit by bit are available (cf. [10]). 

Note that the computation complexity for the database here is minimal - 0(1). In 
fact, the only thing required from D is to send to the user a single bit from the specified 
location. 

Remarks Two questions may arise from our setup stage. First, can the setup stage be 
achieved using only a single server and the database? This would change the required 
number of servers in the scheme to k instead of max {k, 2). Second, and more important, 
note that during our setup stage if R\ and i ?2 collude, together they can find out the 
data. So, to guarantee total independence, they should discard the communication sent 
to them during the setup stage.® Can we construct a different protocol for the setup stage 
which avoids this problem? The following lemma helps us answer the above questions, 
by showing that it is impossible to achieve our setup stage with only two parties. 

Before stating the lemma, let us informally describe what it means for a 2-argument 
function to be privately computable, in the information theoretic model, and with honest 
players (forformal definitions and treatment see [17,6, 8, 13]). Afunction/([a;i], [* 2 ]) = 
( [j/i] , [j/ 2 ] ) is privately computable if there exist a protocol for two players Pi,P 2 , as fol- 
lows. At the beginning of the protocol Pi holds x\ and P 2 holds X 2 - during the protocol 
the players may flip coins and alternately send messages to each other. At the end of the 
protocol, each player P* {i = 1 , 2) can use the communication and his input Xi to re- 
construct his output yi (correctness), but cannot obtain any other information about the 
other party’s input, that does not follow from his own input and output (privacy). 

Lemma 1. The two-argument function /([tt, r], [*]) = ([0], [^(x 0 r)]) is not privately 
computable (in the information theoretic model). 

Proof. See appendix. □ 

From the lemma, it is clear that our setup stage cannot be achieved with a single 
server and the database, since a multi-party computation is needed (rather than a two- 
party computation). This is not a real problem though, since if we want information theo- 
retic privacy, we must have A; > 2in the PIR scheme to begin with, and thus max(fc, 2) = 

®Note that if the servers do not discard the communication, the privacy and independence are 
not compromised, but the total independence is replaced by simple independence (and may be 
extended to f-independence). 
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k is optimal number of servers. If however we are willing to settle for computational pri- 
vacy, then we can achieve the setup stage with a single server. 

As for the second question, note that the lemma implies that there cannot exist a pro- 
tocol (even with arbitrary number of servers), such that the database obtains 7r(r 0 x) 
and the servers jointly obtain no information about x. This is because if we consider the 
information obtained by a coalition of all servers, this is reduced to a two party proto- 
col. Thus, our setup stage cannot be improved in this sense, with respect to our function 
7r(r 0 x). 

4.2 Analysis of the Basic Oblivious Scheme: Proof of Theorem 2 

We now analyze the oblivious scheme in terms of complexity, privacy, and correctness, 
to show that it satisfies the bounds given in theorem 2. 

It is not hard to verify that the setup stage computation is correct, namely that indeed 
y = tt(x 0 r). Now the correctness of the scheme follows from the correctness of the 
underlying S: since the user uses S to obtain rj and j = it follows that j/j = Xi®ri 
and thus xi = yj 0 r*. 

The communication complexity of the scheme is at most (logn 0 l)Cs(l, n) + 
log n+1, where Cs(l, n) is the communication complexity that the underlying scheme 
S requires to retrieve a block of I bits out of n bits. This expression is based on a bit by 
bit retrieval, as discussed above. Alternatively, any other method for retrieving blocks 
can be used, yielding communication complexity C* 5 (logn 0 1, n(logn0 1)) 0log n0 
1, which may be lower than the general expression. The computation complexity of 
the database is 0(1) during the on-line stage because it needs to send only one bit of 
information to the user. During the setup stage the computation of the database involves 
linear computation complexity which is similar to the amount of work it needs to do in 
order to replicate itself in the original PIR model. The communication complexity of 
the setup stage is 0(n log n), which is a factor of log n over the 0{n) of existing PIR 
algorithms, where the database has to be replicated. 

Total independence is clearly achieved, since the auxiliary servers may all be pre- 
determined in advance, and do not change their content after setup stage. 

Database Privacy is also guaranteed by our scheme, even if the underlying S is not 
database private. This is because, no matter what information the user obtains about tt 
and r, this information is completely independent of the data x. The user gets only a 
single bit of information which is related to x, and this is the bit yj at a certain location 
j of the user’s choice. Note that since y = 7r(a; 0 r), the bit yj depends only on a single 
physical bit of x. 

User Privacy with respect to the servers follows directly from the user privacy of 
the underlying scheme, and user privacy with respect to D is maintained in a single ex- 
ecution of the scheme, and in multiple executions up to equality between queries, as we 
prove below. However, if in multiple executions two users are interested in the same 
query i, the database will receive the same query j = 7r(i), and will thus know that the 
two queries are the same. This will be dealt with in section 4.3. We proceed in proving 
user privacy up to equality with respect to the database D. 

Consider an arbitrary sequence (ii , of query indices which are all distinct. 
We will prove that the distribution of D’s view after the setup stage and m execution of 
the on-line stage with these indices is independent of the values ii, ... 

Lemma 2. Let Ketup be the view of D after performing the setup stage. For every per- 
mutation 7T : [I..n] — + [l..ti], Proh\Tt = % \ I4etup] = ^ where probability is taken over 
all the random setup choices tt , tt^, r, P . In particular, D does not get any information 
about the permutation it from the setup stage. 
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Proof. D’s view consists of V^setup = v = 7r^(r^ 0 x^), u = 7r(r^ 0 x^)]. 

Given this view, every choice for a permutation tt fixes the choices of tt^, r, r^. That 
is, every tt corresponds to a single choice (if , , f , f which generates the given view. 

Since all these random choices of the setup stage are done uniformly and independently 
of each other, each such choice is equally likely. Thus, the probability of a particular if 
is J7. □ 

Lemma 3. (User Privacy) Let (*i , . . . , i^) be a tuple of distinct indices in [l..n]. Let 
T 4 etup be the view of D after the setup stage, and V{i\, . . . , im) be the view of D for m 
executions of the on-line stage with queries ... ,im- Then for every tuple (ji, . . . , jm) 
of distinct indices, and for every setup view Ketup. 



Proh\y{ii, = im) I Vsetup] = 

where probability is taken over all the random choices tr, tt^ , r, r ^. /« particular, the view 
is independent of the user queries. 

Proof. Since after the setup stage D did not get any information about tt, as proved in 
proposition 2, every tt is equally likely, and thus the given tuple (jp, ... , j^) may corre- 
spond to any original queries tuple (ii, ... with equal probability. A formal deriva- 
tion follows. Denote by If = {tt | ttG’i, . . . , = 

Prob[V{ii, = jm) \ V"setup] = Y.-K Prohl-K I Vsetup]Proh[V{ii, ...,im) 

(ill • • • jjm) I Ketup, Tt] = Pxob[tr \ fsetup] = X^Treil nT “ ^ n! ^ ^ 

We proved that any two tuples of distinct queries (z'l , . . . , im) and ... , 0 ^) in- 
duce the same distribution over the communication (ji , . . . , jm) sent to D. Therefore, 
the basic scheme is user private up to equality. 

4.3 Eliminating Detection of Repeated Queries: Proof of Theorem 3 

In order for the oblivious database scheme to be complete we need to generalize the 
basic scheme so that it guarantees user privacy completely and not only up to equality 
between repeated executions. To extend it to full privacy we need to ensure that no two 
executions will ever ask for the same location j. To achieve this, we use a buffer of some 
size m, in which all (question, answer) pairs (j, yj ) that have been queried are recorded. 
The on-line stage is changed as follows; the user who is interested in index i first obtains 
the corresponding r^,j from the servers similarly to the basic version. He does that by 
running S to obtain the bit rj, and (in parallel) using the most efficient way available to 

obtain the block j (again, a possible way to do it is by running S'(l, n) log n times). 
Then the user scans the buffer. If the pair [j, j/j) is not there, the user asks D for y-. (as 
in the basic scheme). If the desired pair is there, the user asks D for yj in some random 
location j not appearing in the buffer so far. In any case, the pair (j, yj ) which was asked 
from D is added to the buffer. 

Clearly, a buffer size m results in an additive factor of m log n in the communication 
complexity over the basic scheme. On the other hand, after m executions the buffer is 
full, and the system has to be reinitialized (i.e. the setup stage is repeated, with new r, tt). 
Thus, we want to choose m as big as possible without increasing the communication 
complexity much. A suitable choice for existing schemes will therefore be m = the 

^°so far we are doing the same as in the basic scheme, except we insist that n is retrieved sep- 
arately from j. This is done in order to maintain database privacy in case the underlying S is 
database private, as proved below, and it does not change the communication complexity. 
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same as the communication complexity of the underlying S. This only increases com- 
munication complexity by a constant factor, and still allows for polynomial number of 
executions before reinitialization is needed. We note that in many practical situations, 
reinitialization after m steps is likely to be needed anyway, as the database itself changes 
and needs to be updated. 

The database privacy in this case depends on the underlying scheme 5: If 5 is database 
private (a SPIR scheme), then so is our scheme. This is because, when running S, the 
user gets only a single physical bit out of r. Now, no matter how much information 
the user obtains about tt or y (either from direct queries or from scanning the buffer), 
the data x is masked by r (namely y = 7r(x 0 r)), and thus the user may only obtain 
information depending on a single physical bit of x. 

The other privacy and correctness properties can be verified similarly to the basic 
scheme proofs of section 4.2. 
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A Setup Stage of RDB 

For the basic RDB scheme, the database owner D chooses uniformly at random 7 0 1 
random servers i?i , . . . , Rt+i in {0, 1}", such that for every 1 < j < n, 0 . . . 0 
Rt+i(j) = D(j) = Xj i.e., the xor of all the servers is the original data string x. This 
is done by first choosing Ri, ... ,Rt containing completely random strings (universal 
servers), and then using the following protocol which allows D to prepare an appropriate 
content for the tailored server Rt+i. Since we do not allow the servers to gain any infor- 
mation about each other, the result of this computation should only go to the database. 
One possible way would be to let the database read the content of all universal servers, 
but this would give the database much more information than it needs, which may be 
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a source for future security problems.^ ^ Thus, we use a simple multi-party protocol for 
computing the xor, at the end of which D learns Rt+i but no other information, and the 
servers do not learn any new information. 

Computing Rt+i = Ri® . . .® Rt ® x : Each of the servers Rg (1 < s < t) 

first shares its content among all others and D, by choosing uniformly at random t + 1 
shares Oji , . . . , flst , a' that xor to Rg . Each Ugj is sent to Rj , and a' is sent to D. Next, 
every server xors all shares sent to it from all other servers, and sends the result to D, 
who now xors all the messages and x, to obtain the desired content for 

Eor the RDB scheme of section 3.2, the same summing protocol is performed k times 
(computing R^^i for every 1 < r < A;). 

B Proof of Lemma 1 

Assume towards contradiction that there is a protocol that privately computes the func- 
tion /( [tt, r], [a:]) = ([0],[7r(a; 0 r)]). That is, before the protocol starts, player Pi holds 
(tr, r) and player P 2 holds x. During the protocol Pi , P 2 may flip sequences of random 
coins, denoted ci, C 2 respectively. Denote the communication exchanged between the 
players by comm = comm(7T, r, ci, x, C 2 ). At the end of the protocol. Pi gets no infor- 
mation about x\ P 2 may apply a reconstruction function gi(comm, x, C 2 ) = 7r(r 0 x) to 
obtain his output; and no other information about (tt, r) is revealed to P2. Now, when 
r is chosen uniformly {x, 7 r(r 0 x)) gives no information about tt, and thus P 2 obtains 
no information about tt. In particular, this means that given P2& view {x,C2, comm), any 
permutation could have generated the given communication comm, or more formally, 

\/x, C2, Vci, r, TT, tt', 3 r' , c'l comm(7r, r, ci, x, C2) = comm(7r', r', c'l, x, C2) (1) 

We will show that this implies that Pi can obtain information about x from the com- 
munication, which contradicts the privacy of the protocol. Let P2 conduct the following 
mental experiment on his view (tr, r, ci, comm). P2 sets tt' to be the identity permutation, 
and finds r', c'l that yield the same communication comm (such r' , exist by (1)). Now, 
since 7r(r 0 j;) = £f(comm(7r, r, ci, a:, C 2 ), a:, C 2 ) = fif(comm(7r', r', c'^, a:, C 2 ), a:, C 2 ) = 
7r'(r' 0 a;) = (r' 0 x), P 2 can find out that x satisfies the equation 

a; = r' 0 7r(r 0 a:) . (2) 

Since( 2) cannot hold for all x (unless n — 1), we have shown that P2 obtains some 
information about x, which concludes the proof. □ 



^'e.g. in a setting where the same universal random servers may be used by multiple 
applications. 
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Abstract. We describe almost optimal (on the average) combinatorial 
algorithms for the following algorithmic problems : (i) computing the 
boolean matrix product, (ii) finding witnesses for boolean matrix multi- 
plication and (iii) computing the diameter and all-pairs-shortest-paths 
of a given (unweighted) graph/digraph. For each of these problems, we 
assume that the input instances axe dxawn from suitable distributions. 
A random boolean matrix (graph/digraph) is one in which each entry 
(edge/axc) is set to 1 or 0 (included) independently with probability p. 
Even though fast algorithms have been proposed earlier, they are based 
on algebraic approaches which are complex and difficult to implement. 
Our algorithms are purely combinatorial in nature and are much sim- 
pler and easier to implement. They are based on a simple combinatorial 
approach to multiply boolean matrices. Using this approach, we design 
fast algorithms for (a) computing product and witnesses when A and B 
both axe random boolean matrices or when A is random and B is ar- 
bitrary but fixed (or vice versa) and (b) computing diameter, distances 
and shortest paths between all pairs in the given random graph/digraph. 
Our algorithms run in 0(n^ (log n)) time with 0(n“®) failure probabil- 
ity thereby yielding algorithms with expected running times within the 
same bounds. Our algorithms work for all values of p. 



1 Introduction 

Given two boolean matrices A and B of dimension nxn each, their product C 
is defined as Cjj = A bkj)- If Cij — 1, then a witness for this fact is any 

index k such that = bk,j = 1- The witness problem is to find one witness for 
each Cij which has a witness. Computing boolean matrix multiplication and the 
associated witness problem naturally arises in solving many path problems like 
shortest paths [2, 3, 10, 16], verifying the diameter of a given graph [6]. 

The only known approach to solving the boolean matrix multiplication 
(BMM) problem is by reducing to matrix multiplication over integers by treat- 
ing each 0,1-entry as an integer. Cij = 1 as per BMM if and only if Cjj > 0 as 
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per integer matrix multiplication. The currently best algorithm for multiplying 
matrices over arbitrary rings is due to Coppersmith and Winograd [9] and runs 
in 0(n‘^) time where iv = 2.376... 

Recently, Alon and Naor [3] and independently Galil and Margalit [10] have 
shown how by repeatedly using matrix multiplication, one can solve the boolean 
matrix witness problem (BMWP) in 0(n‘^(logn)'^(^)) time. The approach used 
in [3] is derandomizing a simple randomized algorithm and is based on some 
recent results on small size almost c-wise independent probability spaces. 

The previous approaches to multiplying arbitrary matrices naturally use al- 
gebraic techniques and are quite complex and very difficult to implement. But, 
the BMM, BMWP problems are basically combinatorial in nature. For each i,j, 
one has to select a proper index k (if it exists) such that a*,* = bk,j = 1- There 
is no forcing argument which suggests that a combinatorial approach to these 
problems would not work efficiently. In view of the complex nature of the alge- 
braic approach, it is desirable to have a combinatorial approach for solving these 
problems. 

Basch, Khanna and Motwanit [6] present a combinatorial approach that 
solves these problems in 0{rP j (logn)^) time. This works by dividing the columns 
of A and the rows of B into logn sized groups and treating each logn-length 
0,1-vector as an integer between 0 and n. This helps them to obtain a O(logn) 
improvement over the n^/(logn) algorithm by Arlazarov, et. al. [5]. Other than 
these, there is no subcubic algorithm (based on combinatorial approach) for 
these two problems. As of now, it is not even clear if BMM, BMWP problems 
have the same complexity. 

In this paper, we present an alternative approach^ to solve BMM, BMWP 
problems. So far, we have not been able to provide a worst-case guaran- 
tee of 0{rP~'^) time. However, using our approach, we design almost optimal 
0{rP{logn)) time algorithms for multiplying two random boolean matrices and 
also for finding witnesses. We can also achieve the same within the same time 
bounds, when one of the matrices is random and the other is an arbitrary but 
fixed matrix. The failure probability of our algorithms is 0{n~^) leading to 
0(n^(logn)) expected time algorithms for both problems. It should be noted 
that the two results (on both matrices being random and only one is random) 
are independent and do not imply one another. These results provide strong 
evidence that boolean matrix problems are probably efficiently solvable by our 
or other combinatorial approaches. 

One important feature of our approach is that it takes into account the den- 
sity of the Is in the input matrices whereas the algebraic approach seems indiffer- 
ent to the density of Is. By treating O (logn) -length 0,1-vectors as small integers 

' The first author observed [15] this alternative approach and applied it to derive an 
0{m+n{\ogn)) average time algorithm to compute Ab where A is a random boolean 
matrix and 6 is a fixed column vector. Here m refers to the expected number of I’s 
in A. Independently, the second author observed the same approach and used it to 
derive the results of Sections 2,3,4, 5 and 6. Upon knowing each other’s work, it was 
decided to write this joint paper. 
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as is done in [6], our approach can also be guaranteed to have 0(n^/(logn)^) 
worst-case running time. For some special classes of matrices, we obtain much 
better running times. 

Using our results on boolean matrices, we show how to compute the diameter 
and all-pairs-shortest paths of a random graph/digraph in 0(n^(logn)) time 
with probability 1 — o(l). Previously, the best-known algorithms (without using 
matrix multiplication) require 0(n^/(logn)^) time to verify if the diameter of 
a given graph is d or less, for fixed d. A shortest path between a pair i,j is 
implicitly specified by its first vertex (if any) other than i and j. Most of the 
previous work on average case analysis of shortest path algorithms have been 
only over randomly weighted complete digraphs, randomness was only in the 
weights and not in the presence of edges. Given such a graph, the idea is to 
prove that it is sufficient to consider only 0(n(logn)) edges to find all shortest 
paths and apply standard shortest path algorithms. On the contrary, in our 
model, we introduce randomness about the presence of edges and difficulties 
arise only when we have l7(n(logn)) edges. The two models are not comparable. 
To the best of our knowledge, our algorithmic results on finding shortest paths 
constitute the first work of this nature. 

2 A-columns-S-rows Approach 

In this paper, we describe a different approach to studying boolean matrix prob- 
lems. The usual combinatorial approach can be termed as A-rows-i3-columns 
approach. It works by considering a row of A and a column of B and looking 
for matching positions in these n- vectors. This follows the definition of BMM. 
On the other hand, in our approach, we consider any index k and consider the 
fc-th column of A (denoted A*ft) and the fc-th row of B (denoted i?**). For any 
i,j such that — 1 — bkj, k is a witness for the pair i,j and hence Cij — 1. 
That is, every pair of Is in A** and B/.*, is witnessed by k. This implies, if A** 
and Bk* each have I7(n) Is, then we get a witness for each of f2{n?) pairs in 
in constant time per pair. If we repeat this for every k, we get witnesses for all 
pairs. We call this method A-columns-B-rows approach. 

One drawback is that any pair will be counted for each of its witnesses. That 
is, if a pair has several witnesses, then it will be counted several times resulting 
in redundant work. However, this approach gives faster combinatorial algorithms 
than previously possible for some speciall classes of boolean matrices. 

First, we do a preprocessing during which we group together all Is in each 
A*ft in a binary search tree A*. Similarly, we group together all Is in each B^* 
in a search tree B^? The information stored in the search trees include the row 
and column numbers as well. Then, we compute the sizes nA{k) of each A/, and 

^ Sometimes, it is useful to store the sets Au,Bk as doubly-linked lists. For both 
ways of storage, all pairs {i,j) {i € Ak,j € Bj) of elements can be enumerated in 
0(|Afc| • |Hfc|) time. When there is a need for searching the sets, it is useful to store 
Ak,Bk as binary search trees, even though it takes 0{\Ak\ ■ (log |Afc|)) time to build 
the search trees. 
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nsik) of each Bk- There are exactly nA(k) • nsik) pairs having fc as a witness, 
for each k. 

If J2k(''^A{k) -nB(k)) = m, then we get an 0{m + 'n? logn) algorithm. If m = 
0(n2+f), in particular if each pair has at most rf witnesses, then EMM, BMWP 
both can be solved in 0(n^+^) time. This is certainly faster than 0(n^ / (logn)^)) 
algorithm for such special classes of matrices. 

On the other hand, if each pair has either zero or sufficiently large number of 
witnesses, then a witness for each pair can be found easily. Suppose the pair (i,j) 
has witnesses. If we pick up an index value k between 1 and k repeatedly 

and randomly, and check if it is a witness for then in 0(n'^(logn)) steps, we 
will encounter a witness for {i,j), with probability of at least 1 — n~^. Thus we 
can obtain witnesses for all such pairs in 0(n^‘*‘'^(logn)^) time, with probability 
of at least 1 — n~^. In fact it is enough to pick a set of 0(n'^ (logn)) random 
values of k only once (as described in the algorithm given below). In particular, 
for e = 0.5, we can find in 0(n^'® (log n)^) time, a witness for ech pair. This 
algorithm is deterministic if each pair has at most witnesses, or randomized 
(with failure probability at most n~^) if each pair has either zero or at least 
witnesses. 

Not knowing whether each pair has at most n^ (e < 0.5) or at least n^~'^ 
(e < 0.5) witnesses is not a handicap. For the former case, we merely need to 
check if m (the total number of witnesses) is at most For the latter case, 
let I be the minimum non-zero number of witnesses that any pair has. Then it 
is enough to pick an index k between 1 and n repeatedly and randomly at most 
5(u/Z)(logn) times, to ensure that we pick a witness for every pair (with at least 
one witness), with probability > 1 — n~^. We still do not know the value of 1. 
However as shown below, we only need to know an approximate value of 1. Note 
that since I > rP~'= with e < 0.5, w have I > rP'^. 

Suppose we divide the interval [0, n] into [0], [1, 2], (2, 4], (4, 8], . . . , (2-1, n] for 
some j. Clearly, there are only logn intervals and I lies in one of the intervals. 
For any interval indexed by s, for any I' 6 (2®,2®+^], the maximum and the 
minimum values oin/V differ by a factor of at most 2. Moreover, as I' goes from 
n down to n°'®, n/V grows from 1 (when n = V) monotonically. Thus, starting 
with the last interval and going down over successive intervals, we use the lower 
bound 2® as an approximate value of I and pick up the required number of 
random values of k. When we come to the correct interval, we will find witnesses 
for all pairs with probability of at least 1 — n“^. If I G (2®,2®"'‘^], the total time 
is O ((27s<r<ju/2'’)u^(logn))^). But Ss<r<jn/2^ < 2n/2® = oln/l)). Thus the 
total running time is 0(ri^“'“*^(logn)^). and remains the same as when we know 
apriori the exact value of 1. 

The algorithm explained so far has formally been described below as algo- 
rithm Bool-Matrix-Witnesses. Based on the arguments given above we can 
assert the following: (i) If m = 0(n^+'^) for some e < 0.5, then C and W 
can be computed deterministically in 0(n^ -I- m) time, (ii) If every pair has 
either zero or at least rP~'^ (or more strongly \N\/vP, N defined in Step 3) wit- 
nesses for some e < 0.5 , then C and W can be computed deterministically in 
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0(n^+'^(logn)^) time with probability at lest 1 — n (iii) The algorithm always 
runs in 0(n^'® (log n)^) time. 

Remark 1. While assertion (ii) can be realized by using direct method also, it 
does not seem possible to realize assertion (i) by the direct method. 

Algorithm 21 Bool-Matrix-Witnesses 

(Input: Boolean Matrices A and B of size n x n each ) 

(Output: Product matrix C and witness matrix W.) 

1. for each i, e {1, 2, . . . , n}, set Cij = Wij — 0. 

2. for A: = 1 to n do 

(a) Store the Is in A^k in a search tree Ak and store the Is in Bk* in a search tree 

Bk- 

(b) Compute riA(k) — \Ak\, and nsik) — \Bk\- 

3. if EknA{k)nB{k) — 0(n^'®(logn)) then 

(a) for A: = 1 to n do 

i. for all pairs where i 6 Ak,j 6 Bk, set Cij — 1 and Wij = k. 

(b) Exit. 

4. Compute A = {A | nu(fe) # 0, ns(A:) / 0}, r„ = [log 2 nJ, ri = [logjU® ®] and set 
r = r„. 

5. repeat 

(a) Randomly (with replacement) draw s — 5(n/2’')(logn) integers ki, fe, . . . , fe* 
from N. 

(b) for each k € {ki, k 2 , ■■■ ,ks} do Execute step 3(a) (i). 

(c) Set r = r — 1. 

until either r < n or there are at most r? pairs with no witnesses so far. 

6. if there are at most pairs with no witnesses, find the witnesses for these pairs 
by a direct method. 

3 An 0(n^(logn)) Algorithm for Random Boolean 
Matrices 

In this section, we show that if both input matrices are random instances, we 
can obtain, using our approach, almost optimal 0(n^ (log n)) combinatorial algo- 
rithms for finding witnesses for all pairs, with very high probability. We consider 
the following random model. A and B are both random boolean matrices. Each 
entry in both of them is set (independently) to 1 with probability p = p{n) and 
0 otherwise. Our results hold for all values of pin). For this section, we assume 
that the sets Ak,Bk of non-zero entries in A^,k,Bk* respectively are stored in 
doubly-linked lists. We initially set the product matrix and the witness ma- 
trix Wij both to be zero for all i and j. This takes 0{n^) time. We will often be 
using the CH bounds [17, 13, 14] given below. 

ChernofF-HoefFding (CH) bounds [17] : Let be independent i&n- 

dom variables that take on values in [0, 1], with E[Xi] = pi, 1 < i < m. Let X= Xi 
and p,=E{X) = ^^Pi- Then, 

Pr(A < p(l - 5)) < for any 5 € [0, 1] 

Pr(X > p(l + 6}) < e~'^^ for any <5 e [0, 1] 

Pr(X > p(l -I- 5)) < e-A‘(i+^)0°Se(i+'5))/4 foj. any 5 > 1 
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Case p(n) < 10(logn)/n : 

By CH bounds, for any fixed k, the number of non-zero entries in A** is 
O(logn) with probability^ at least 1 — n~^. Similar statement holds for the 
matrix B also. This implies that, with probability at least 1 — 0{n~^), we have 

nA(k), nsik) = O(logn), for all k 



Hence 



'Y^(riA{k) •ns(A:)) = 0{n{\ognf) 

k 

Thus we can find witnesses (if there is any) for all pairs in 0(n^ -I- n(logn)^) = 

0{'n?) time. 

Case 10(logn)/n < p(n) < 10\/(logn)/n : 

Again, by Chernoff-Hoeffding, for any fixed k, and any positive constant 
d > 1, the number of non-zero entries in is at most 10(1-1- S)\/n(logn) 
with probability at least 1 — e-io(i+(5)np/4 > i _ This implies that, with 

probability at least 1 — 0{n~^), 

riA{k), riB{k) < 10(1 -|- 5)\/n(logn), for all k 



Hence 



'^(uAik) -nBik)) = 0(n^(logn)) 

k 

and witnesses for all pairs can be found in 0(n^(logn)) time. 

Case 10\/(logn)/n < p(n) < 1 : 

First, we compute an approximation to p as follows : q = NZ/'n? where 
NZ is the number of non-zero entries in A. Since each entry is set to 1 or 0 with 
independent probabilities, by CH bounds, we have for any 0 < d < 1, 

(1 — 6)n^p < NZ < (1 + 6)n^p 

with probability at least 1 — As long asp > and 6 = (logn)“-“ 

for fixed positive e, Af, we have 1 — 2e“” >i — n~^, say. This means that 
q = p[l ± o(l)] with polynomially low failure probability. 

Now fix a value of i and fix 5 = 0.8. As before, we assume that the sets 
Hft* have been found out and stored in doubly linked lists Bj.. Let no = 
[25(logn)/g^] . Due to the lower bound on p in this case, no < n/4. Consider the 

® Actually, when p(n) — o((logn)/n), we can not directly deduce this failure proba- 
bility from CH bounds because the expected number of Is in A**, becomes o(logn). 
However, we can overcome this problem by choosing a larger value for p and applying 
CH bounds. This is justified since we are only trying to upper bound the number of 
Is. 
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first no columns of the i-th. row, the entries aj,i, . . . , By CH bounds, with 
probability at least 1 — > 1 - 0(n“®), there are at least (1 — 6)nop 

and at most (1 + 5)nop columns among the first no columns such that there is 
a 1-entry in those columns in the f-th row. 

Let ki,. . . ,ks be the corresponding column indices, where (1 — 5)riop < s < 
(1 + S)nop, with very high probability. Now consider the rows of B with corre- 
sponding indices k^,. . . ,kg. There are totally ns possible 0,1-entries. Of these, 
there will be at least {l—5)nsp and at most {l+5)nsp non-zero entries, with prob- 
ability 1 — 0{n~^). Substituting the values of no and bounds on q, s, we deduce 
that there will be at least (1 -|- <5^ — 2<5)n(logn) and at most (1 -|- <5^ -I- 2<5)n(logn) 
non-zero entries in the rows 
Also, for any column j, 

Vr{bki,j = 0, V / = 1, . . . , s) = (1 - p)'* < (1 - p)(i-'5)«oP = Oin-^) 

Hence, with probability > 1 — for every j, the pair (i,j) has at least one 

witness in {k \, . . . , kg] and the total number of such witnesses is 0(n(logn)). 

The indices ki,. . . ,kg can be found in 0(n) time. The sets , . . . , B/.^ 
represent the set of all witnesses within {ki , . . . , kg} for each column. Thus the 
total running time for finding witnesses for all pairs corresponding to the *-th 
row is 0(n(logn)). Thus witnesses for all pairs can be found in 0(n^ (log n)) 
time, with probability at least 1 — 0{n~^). 

Remark 2. If we apply the straightforward O(n^) algorithm upon the failure of 
our 0(n^(logn)) algorithm, we get an 0(n^(logn)) expected time algorithm for 
BMM, BMWP problems. The above given analysis works when B = A also. 
Hence computing and its witnesses can be done in 0(n^(logn)) time, for a 
random A. 

4 An 0(n^(logn)) Algorithm for Random A and Fixed 
Arbitrary B 

When only A is assumed to be a random matrix and B is arbitrary 
but fixed, we can find witnesses in 0(n^(logn)) time with probability at 
least 1 — n~^. Let L = L(n) be the positive integer such that 2^ < 
n/10(logn) < 2-^+^. Let m = 10(logn)/n and divide the interval [0,1] into 
[0, m], [m, 2m], . . . , [2-^m, 2-^+^m], . . . , [2^m, 1]. As before, we assume that each 
entry in A is chosen with probability p(n). Initially, we assume that the al- 
gorithm knows the value of p. Also, we maintain the non-zero entries of each 
column of H in a doubly-linked list. As before, we initially set = Wij — 0 for 
all i,j. 

Case p(n) < 10(logn)/n : 

As before, for any fixed k, we have nA{k) = O(logn) with probability at least 
1 — n~‘^. Hence = 0(n^(logn)). Thus, witnesses for all pairs 

can be found in 0(n^ (log n)) time with probability at least 1 — n~^. 

Case 2* • 10(logn)/n < p(n) < • 10(logn)/n for some I : 
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For any j such that the jth column of B, B^,j, has at most n/2^ Is, we 
certainly can find witnesses for all pairs {i,j) {1 < i < n) hy scanning all non- 
zero entries of the fc-th column of A, for each k such that bhj = 1- The number 
of such non-zero entries is 0{ri^p/2^) = 0(n(logn)) with probability > 1 — 

This takes 0(n(logn)) time for each such j. 

If j is such that B^,j has at least n/2^ Is, then consider the first n/2^ row 
positions ki,. . . ,kg where s = n/2\ corresponding to non-zero entries in B^,j. 
For any fixed i, 

Pr(ai,jfc, = 0, Vr = 1, . . . , s) = (1 = 0(n~^) 

Hence every pair (i,j) has a witness in {k \, . . . , A;*} and these witnesses can be 
found by scanning the non-zero entries of But there are only 

0(n'^p/2‘) = 0(n(logn)) non-zero entries in these columns, with probability 

> 1 — This takes 0(n(logn)) time for each such j. Hence witnesses for all 
pairs can be found in 0(n^ (log n)) time with probability at least 1 — n~^. 

Case 2^ • 10(logu)/n < p(n) < 1 : 

As before, for any j, consider all Is in B^,j if there are only at most n/2^ Is, 
otherwise consider the first n/2^ Is in Let k\, ... ,kg be the row positions 
corresponding to these Is. We have s < n/2^ in the former case and s = n/2^ 
in the latter case and nj2^ < 20(logn). Let us consider the non-zero entries in 
the columns , . . . , . In the former case, these Is give witnesses for all 

pairs (which have a witness) (z, j) and it takes 0(n(logn)) time for each such j. 
In the latter case, consider any fixed i. Then, 

~Pv{ai^kr. = 0, Vr = I, . . . , s) = (1 = 0(n“®) 

Hence every pair ii,j) has a witness in {fci , . . . ,kg} and these witnesses can be 
found by scanning the non-zero entries of A^,^ki t ■ ■ ■, . But there can be only 

0(n(logn)) non-zero entries in these columns. This takes 0(n(logn)) time for 
each such j. Hence witnesses for all pairs can be found in 0(n^(logn)) time with 
probability at least 1 — n~^ . 

Remark 3. The working of the algorithm depends on the value of p. Our analysis 
assumes that the algorithm knows the values of p. However, as shown in the previous 
section, we can get an estimate q such that q = p[l ± o(l)] with very high probability, 
as long as p > for any positive constant e. Once we have q, we can find the range 

into which it falls and accordingly the algorithm proceeds. Since q is not the same as 
p, it may be that q falls into an interval other than the one to which p belongs. But 
since q = p[l ± o(l)], it should only fall into an adjacent interval. So, we modify the 
algorithm to run for these adjacent intervals also. But this only increases the running 
time by a constant factor. Thus algorithm runs in 0(n^(logn)) time even when we use 
q in place of p. When p < the number of Is in A is 0(n'^) with probability 

> 1 — 0(n“®). But this means that the total number of witnesses for all pairs is only 
0{n^+^). 

Remark 4- It should be noted that the algorithm is the same for all B and the analysis 
is also the same. However, the analysis holds only when B is assumed to be fixed. The 
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same result holds when A is arbitrary but fixed and B is a random boolean matrix. 
This is because the matrix multiplication definition is symmetric with respect to A and 
B. Computing ^4 • B is equivalent to computing B^ • ^4^ = {A ■ B)^. Hence when B is 
random and A is arbitrary, B^ • can be computed using the algorithm described in 
this section. 



5 Computing the Diameter 

Based on the results developed so far, we show how to estimate the diameter 
of random graphs. For an undirected graph G = {V, E), for u,v GV, we define 
d{u, v) to be the length of any shortest path between u and v if any exists. 
Otherwise, define d{u, v) to be oo. For directed graphs, we consider the directed 
paths for distance calculation. The diameter of G, diam{G), is defined as the 
maximum distance between any pair of vertices, diam(G) — max{d{u, v)\u,v G 

If we adopt the algebraic approach, we can find the diameter in 0{rA) time 
using the shortest path algorithms developed in [2, 16]. As pointed out in [1], this 
seems to be the only known approach for estimating the diameter. Chung [8] had 
earlier asked whether there exists an algorithm for diameter computa- 

tion which does not use fast matrix multiplication. Basch, et. al. [6] and Aing- 
worth, et. al. [1] present combinatorial algorithms. The algorithm of [6] requires 
0(n^ / (log n)^) time to verify if an arbitrary graph has diameter d for any fixed 
d > 2. [1] presents an 0(n^'®\/Iogn) algorithm for diameter computation with an 
additive error < 2. The running times are with respect to worst-case measure. 
On the other hand, results of this section show that there is an 0(n^ (log n)) 
algorithm for computing the diameter exactly if the input is a random graph. 

First consider the undirected random graph G = (V,E) drawn from the 
model Q(n,p). Here each of ( 2 ) edges is chosen independently with probability 
p = p(n). When p(n) < 30(logn)/n, by CH-bounds, there are only 0(n(logn)) 
edges in the random graph, with probability at least 1 — 0(n~^). In this case, 
we can apply the straightforward approach (breadth-first-search starting at ev- 
ery vertex) to find the diameter in 0(n^ (log n)) time. Hence we assume that 
p{n) > 30(logn)/n. First, we prove (in Section 6) that with probability at 
least 1 — 0(n~^), there is a d-path between any pair of vertices in G, where 
d = (log n) / (log log n) and hence diam(G) < d. In what follows, we show how to 
find the exact value of diam(G) in 0(n^(logn)) time. 

Notations: For a pair (i, j), a walk of length I >0 between i and j is a sequence of 
alternating vertices and edges such that it starts at i, ends at j, and there are I edges 
in the sequence. A path is a walk in which no edge or vertex appears more than once. 
For any pair (i,j), there exists a walk of length of length < I, iff there exists a path 
of length < 1. For a vertex i, and an integer I > 0, let Ni(i) (and ni(i)) denote the 
set (and number) of all vertices j such that d(i,j) < I- The adjacency matrix A of G 
defined by atj = 1, if either i = j or (i,j) G E. This is slightly different from the usual 
definition in which = 0. The following facts are easy to verify. 
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Fact 51 (i) A is a random boolean matrix^ (ii) A represents all pairs which are at 
a distance at most 1. (iii) In (boolean multiplication), any pair i,j has a 1-entry if 
and only if there is a walk (and hence a path) of length < 2 between i and j in G. In 
general, A* has a l-entry in [i,j)-th position if and only if there is a path of length < I 
between i and j. (iv) d{i,j) is the least value of I such that A* has a 1 in its {i,j)-th 
entry. 

Case p(n) > lO^logn/n : 

By CH bounds, with probability > 1— n“^, any two vertices share at least one 
common neighbor and hence the diameter is at most 2. Thns A^ has a 1-entry 
everywhere. We compute A^ and its witnesses by setting B = A and applying 
the algorithm outlined in Section 3. By Remark 3.1, this algorithm runs in 
0(n^(logn)) time and succeeds with required probability. Now, by comparing 
A and A^, we can compute the distances and shortest paths for every pair of 
vertices. We should note that even if diameter is at most 2, it is not easy to 
verify it quickly in the worst-case. 

Case 30(logn)/n <p(n) < 10\/logn/n : 

Initially, we set B = A (first iteration). Then we update B = AB for at most 
d iterations. Each time we compute a product, we also compute its witnesses. 
We stop the procedure the first time we notice that all entries of B are 1. If this 
happens after the /-th iteration, then diameter of G is 1. Since diameter is at 
most d, we need at most d iterations. To analyze these multiplications, we can 
neither directly use the algorithm and its analysis given in Section 3 (since after 
the second iteration, B ceases to be truly random) nor directly use the analysis 
of Section 4 (since B is not fixed). However, by carefully estimating the number 
of Is in successive B’s and using this to decide when the multiplication algorithm 
should stop, we can achieve all multiplications (without loosing correctness) in 
a total time of 0(n^ (log n)). The details are given below. 

For a pair i,j such that bij becomes 1 for the first time in the /-th iteration, 
we record the product witness (for this pair in this iteration) as the witness for 
a shortest path between i and j. We also record that d{i,j) = 1. Thus, we can 
not only correctly compute the diameter, but also compute the shortest paths 
(through witnesses) between all pairs in G. The following lemma helps ns in 
tightly bounding the running time. Its proof is provided in the full version of the 
paper. 

Lemma 1. With probability > 1 — for some L{1 < L < d — B) and some 
s = 0,1,2, the following hold: For all i &V, (i) riL-i(i) < l/(plogn) < ni(i) 
andriL-iii) = o(ni(/)); (ii) ni+s_i(i) < 5(logn)/p < ni+s(/) and nL-\-s-i{i) = 
o{nL-\-s{i)); (iii) every vertex not in Ni+s{i) will be adjacent to at least one 
vertex of the first 4(log n)/p vertices of Nl+s{i) — Ni+s-i{i). 

By Fact 5.1, after the /-th iteration, the set of Is in and Bi^^, is precisely Nfii). 
So we perform B = AB till nfii) > 5(logn)/p for all i. This will happen exactly 

^ The diagonal entries of A are not random, but our algorithms for random BMM, 
BMWP work even if the diagonal entries are not random. 
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after the (L + s)-th iteration by Lemma 5.1. It means till (L + s)-th iteration, we 
have \Bi\ < 5(logn)/p. Similarly, till L-th iteration, we have \Bi\ < l/p(logn). 
Till L-th iteration, we compute B = AB by scanning (for each j) the 0{np) Is in 
for each k such that bhj = 1 and there are at most l/p(logn) such fc’s. The 
total running time for iterations numbered 1, . . . , L is 0(nP L+nL-np/p{logn)) = 
0(n^(logn)). For iterations, T + l,...,L + s, we multiply in the same way taking 
0(n^s + ns ■ np{logn)/p). Since s < 2, we get this time to be 0(n^(logn)). For 
the (L + s + l)-th iteration, we consider only the first 5(log n) jp non-zero entries 
in each column of B, for computing AB. By Lemma 5.1(iii), this will be the last 
iteration taking 0(n^ (log n)) time. So the total time taken is 0(n^(logn)) and 
success probability is 1 — 0(n~^). 

The proof of diameter (given in Section 6) extends to random digraphs also. 
Hence the algorithm outlined in this section can be used to compute the diameter 
of random digraphs also. We formally state the algorithmic results obtained in 
Sections 3,4 and 5 below. 

Theorem 1. Let A and B be two boolean matrices. Let G be a random 
graph/digraph in which each possible edge/arc is chosen with probability p. The 
following hold. 

1. If A and B are both random, then their product and its witnesses can be 
computed in 0(n^(logn)) time with probability at least 1 — 0{n~^). 

2. If one of A and B is random and the other is arbitrary but fixed, then their 
product and its witnesses can he computed in 0{n^ (logn)) time with proba- 
bility at least 1 — 0{n~^). 

3. The diameter of G, distances and shortest paths between all pairs can be 
computed in 0(n^(logn)) time. The failure probability is 0(n“^). 

6 diam(G{n,p)) < (log n) /(log log n), p > 30(logn)/n 

Diameter of random graphs have been studied before [7, 12]. But these results 
only state that for certain ranges of p(n), the diameter is sharply concentrated 
in a few values. But, what we need is a bound on the diameter for a broader 
range of p{n) so that we can bound the running time. Also these results do not 
seem to have the required rate of convergence of the failure probability. But we 
need a guaranteed failure probability of at most n~^ so that we can get faster 
(on the average) algorithms. Hassin and Zemel [11] mention a bound of O(logn) 
on the diameter. We prove a stronger bound of log n/ log log n in the following 
theorem. The proof uses Janson inequalities. Note that the theorem is actually 
a stronger statement than specifying an upper bound on the diameter. 

Theorem 2. Let G{n,p) be a random graph/digraph on n vertices obtained by 
choosing each edge/arc with probability p > 30(logn)/n. Then, with probability 
at least l — 0{n~^), there is a d-path {d = (log n)/ (log log n)) in G between every 
pair of vertices. 
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Proof. We prove only for random graphs. For random digraphs, the argu- 
ments are similar. Write p = w/n for simplifying the calculations. We have 
w > 30(logn).® Fix two vertices u and v in V. For d = (log n)/ (log log n), let 
Pi, ... , Pm, m = {n — 2) . . . (n—d), be the collection of all possible d-paths (paths 
of length d) between u and v in G. For each i, let P* be the event that P* is 
present in G. Also, let na{u,v) denote the number of d-paths between u and v 
in G. We have 

p = E(rid{u,v)) = (n-2) . . .{n- d)(w/dY = (u)‘^/n)[l -h o(l)] 

Since w > 30(logn), we have ji = w(l). 

We use Generalized-Janson-Inequality(GJI) [4, Chapter 8] to prove that 
Pr(there is no d-path between u and v) = 0(n~^). This proves the required 
bound on the diameter. The inequality is used as follows. For any i ^ j, 1 < 
i,j < m, we write iSj if the paths P* and Pj share at least one edge in common. 
For any i, define A = ^ ^j)- Define A = Det e = 

Pr(Pj) = {w/nY = o(l). GJI states that if Zi > /j{l — e), then Pr(Pj does not 
hold for all i) < We can verify that A> p. 

Thus if we can prove that A < 3fjf/w, it implies that Pr(AjPj) < = 

0(n-5). 

Fix an i and let P* be the path {u,ui,. . . , Ud-i,v). Consider any j such that 
iSj. Then Pj shares at least one vertex (other than u and v) and at least one 
edge in common with Pj. Let v{i, j),e{i, j) denote respectively the number of 
common vertices (excluding u and v) and the number of common edges shared 
by Pi and Pj. 

Let Ji = {j : jSi}. Partition Jj = Jf U Jf where Jf = {j € J* : v{i,j) = 
e{i,j)} and Jf = {j 6 Ji : v{i,j) > e{i,j) + 1}. Clearly, we can not have 
v{i,j) < e(i,j). Further partition 

-J? = U Jiih) where j“(/e) = {j G = YhJ) = Q- 

l<le<d-l 

4 = U = {j e Jl : = le}. 

l<le<d-l 

Note that e{i,j) always lies between 1 and d — 1. 

Clearly, for any j e Ji, we have Pr(Pj APj) = Hence for any 

le, 



Vv{BiABj) = {wlnf^-^^\j}{h)\ ( 1 ) 

Since v(i,j) > e(i,j) + 1 for each j G Jf, we have |J/(/e)| < (n‘^“^‘=“^)(d^<= ). 
In this expression, bounds the number of different subsets of Pj of size Zg- 

® We also assume that w < n(loglogn)/(logn). Otherwise, any two vertices in V share 
a common neighbor with probability > 1 — 0(n~Y and hence diam{G) < 2. 
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Having fixed such a subset, ^ bounds the number of js that can be in 

J}{le)- Using this in (1), we get 

{w/nYu/n, since d<w (2) 



< {w/nYdii/n < {w/nYfi/w (3) 

any j G Ji{le), note that Pi fl Pj should 
be the union of first /' edges and the last 1” edges on Pj for some /' + 1” = fi. 
Otherwise, we can not have v{i,j) — e{i,j). For a given fi, there are only Ze + 1 
such possibilities. Also, for a given subset of size there are at most 
such js that can be in Using these, we get for each 

^ Pr(Pj A Bj) < (w/nf^-^‘ ■ • {fi + 1) 

< {w/nYia{le + 1)/W‘= 

Hence E Pr(Pj A Bj) < {w/n)^ia (5](Ze + l)/u;'M (4) 

ieJ? V (e / 

Now using the fact < d — 1 = o(w), we deduce that + l)/w^° < 2/w 

ignoring the [1 + o(l)] factor. Combining (3) and (4), we get A* < {w/nj’^Sji./w. 
Hence A = A* < 



^ Pr(PjAP,)< 
< 



Hence 



E Pr(P*AP,) 

Now we turn our attention to JP. For 



7 Conclusions 

In this paper, we introduced an alternative approach to boolean matrix multi- 
plication and showed that it leads to almost optimal (on the average) combina- 
torial algorithms for solving BMM, BMWP problems, computing the diameter, 
all-pairs-shortest-paths in a graph/digraph. The constant 30 stated in Theorem 
6.1 can be decreased to 18 with a corresponding increase in failure probability 
to 0{n~^). It may be possible to bring it below 18. Some interesting questions 
these results lead to are given below : 

1. Does there exist an implementation of the A-columns-B-rows approach which finds 
witnesses in o(rP ) time (randomized or deterministic) in the worst-case ? Or more 
generally, is it possible to maintain a collection of subsets of a universe so that we 
can compute their intersections efficiently. 

2. In Section 4, can the requirement that B is fixed be removed ? In other words, can 
one design a fast algorithm for witnesses when A is random and B is arbitrary. 
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3 . Can we remove the logn factor from the algorithms of Sections 3 and 4 ? 

4 . Can all-pairs-shortest-paths problem be solved in 0{n^~‘) time without using ma- 
trix multiplication ? 

Acknowledgement : The second author thanks Volker Priebe for bringing to 
his attention the work of the first author and also the work of [11]. 
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Abstract. We study the power of randomization in the design of on- 
line graph coloring algorithms. No specific network topology for which 
randomized online algorithms perform substantially better than deter- 
ministic algorithms is known until now. We present randomized lower 
bounds for online coloring of some well studied network topologies. 

We show that no randomized algorithm for online coloring of interval 
graphs achieves a competitive ratio strictly better than the best known 
deterministic algorithm [KT81]. 

We also present a first lower bound on the competitive ratio of random- 
ized algorithms for path coloring on tree networks, then answering an 
open question posed in [BEY98]. We prove an l7(log A) lower bound for 
trees of diameter A = O(logn) that compares with the known 0(A)- 
competitive deterministic algorithm for the problem, then still leaving 
open the question if randomization helps for this specific topology. 



1 Introduction 

In this paper we present randomized lower bonnds for a class of online graph 
coloring problems. The inpnt instance to an online graph coloring problem is 
a seqnence tr = {t>i, ...,P|(^|} of vertices of a graph. The algorithm mnst color 
the vertices of the graph following the order of the seqnence. When the color is 
assigned to vertex n*, the algorithm can only see the graph indnced by vertices 
{t>i, The goal of a graph coloring algorithm is to use as few colors as 

possible under the constraint that adjacent vertices receive different colors. 

Online graph coloring problems have been studied by several authors 
[KT91,HS92,LST89,Vis90]. The study of online graph coloring has actually been 
started even before the notion of competitive analysis of online algorithms was 
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introduced [ST85]. Kierstead and Trotter [KT81] in 1981 considered the on- 
line coloring problem for interval graphs. Every vertex of an interval graph is 
associated with an interval of the line. Two vertices are adjacent if the two cor- 
responding intervals are intersecting. Since interval graphs are perfect graphs 
[G 0 I 8 O], they have chromatic number % equal to the maximum clique size u), i.e. 
the maximum number of intervals overlapping at the same point of the line. 

In [KT81] it is a presented a deterministic online algorithm that colors an 
interval graph of chromatic number w with 3w — 2 colors. They also prove that 
the 3w — 2 bound is tight: for every deterministic algorithm there exists an input 
sequence where the algorithm uses at least 3w — 2 colors. 

The online interval graph coloring problem has a natural extension to trees. 
Every vertex of the graph, called the intersection graph, is in this case associated 
with a path on a tree network. Two vertices are adjacent in the graph if the 
two corresponding paths are intersecting. This problem has recently received 
a growing attention due to its application to wavelength assignment in optical 
networks [RU94,BL97,GSR96]. 

Several authors show an 0{A) competitive deterministic algorithm for the 
problem of coloring online paths on a tree network (see for instance 
[BL97,GSR96]), where A is the diameter of the graph. Bartal and Leonardi 
[BL97] also show an almost matching Q{A! log A) deterministic lower bound 
on a tree of diameter A = O(logn), where n is the number of vertices of the 
graph. 

In this paper we present the first randomized lower bounds on the competi- 
tive ratio of randomized algorithms for online interval graph coloring and online 
coloring of paths on tree networks. 

Randomized algorithms for online problems [BDBK+90] have often been 
proved to achieve competitive ratios that are strictly better than deterministic 
online algorithms. The competitive ratio of a randomized algorithm against an 
oblivious adversary is defined as the maximum over all the input sequences of 
the ratio between the expected online cost and the optimal offline cost. The 
input sequence for a given algorithm is generated by the oblivious adversary 
without knowledge of the random choices of the algorithm. 

However, for no network topology it is known a randomized online coloring 
algorithm that achieves a substantially better competitive ratio that the best 
deterministic algorithm for the problem. The first result we present in the paper 
is also along this direction. 

We present the first randomized lower bound, up to our best knowledge, for 
online coloring of interval graphs. We show that any randomized algorithm uses 
an expected number of colors equal to 3cu — 2 — o{1/(jj) for an interval graph 
of maximum clique size equal to oj, thus proving that randomization does not 
basically improve upon the best deterministic algorithm of [KT81]. 

Our second result is the first randomized f?(logZi) lower bound for online 
coloring of paths on a tree network of diameter A = O(logn), then answering an 
open question of Borodin and El-Yaniv [BEY98]. There is still a substantial gap 
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between the presented lower bound and the 0(A) deterministic upper bound 
known for the problem [BL97,GSR96]. 

The current status of the online path coloring problem on trees can be com- 
pared with the known results for the dual problem of selecting online a maximum 
number of edge-disjoint paths on a tree network, i.e. a maximum independent 
set in the corresponding intersection graph. An 0(logZi)-competitive random- 
ized algorithm is possible for the online edge-disjoint path problem on trees 
[AGLR94,LMSPR98], against an J7(Z\) deterministic lower bound obtained even 
on a line network of diameter A = n [AAP93]. Our result still leaves open the 
question if an 0(log4l)-competitive randomized algorithm is possible for the 
online path coloring problem on tree networks. 

Irani [Ira90] studies the problem of coloring online inductive graphs. A graph 
is d-inductive if the vertices of the graph can be associated with numbers 1 
through n in a way that each vertex is connected to at most d vertices with 
higher numbers. Irani shows that any d-inductive graph can be colored online 
with O(dlogn) colors and presents a matching J?(logn) deterministic lower 
bound. The graph obtained from the intersection of paths on a tree network has 
been independently observed to be a {2co — 1) inductive graph by [BL97] and by 
Kleinberg and Molloy as reported in [BEY98]. Our lower bound for online path 
coloring on trees then implies a first J? (log log n) lower bound on the competitive 
ratio of randomized algorithms for online coloring of inductive graphs. 

We conclude this section mentioning the previous work on randomized online 
coloring algorithms for general graphs. Vishwanathan [Vis90] gives an O(n/logn) 
competitive randomized algorithm, improving over the 0(n/ log* n) determinis- 
tic bound of Lovasz, Saks and Trotter [LST89]. Halldorson and Szegedy [HS92] 
give an I2(n/log^ n) randomized lower bound for the problem. Bartal Fiat and 
Leonardi [BFL96] study the model in which a graph G is known in advance to 
the online algorithm. The sequence a may contain only a subset of the vertices 
of G. The algorithm must color the subgraph of G induced by the vertices of 
(7. The authors show that even under this model an I2(n^) randomized lower 
bound, for a fixed e > 0, is possible. 

The paper is structured as follows. Section 2 presents the lower bound on 
online coloring of interval graphs. Section 3 presents the lower bound for path 
coloring on tree networks. Conclusions and open problems are in Section 4. 



2 A lower bound for online interval graph col- 
oring 

In this section we present a lower bound on the competitive ratio of randomized 
algorithms for online interval graph coloring. 

The input instance to the online interval graph coloring problem is given 
by a sequence of intervals on a line graph. Every interval is denoted by two 
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endpoints of the line. The algorithm must color the intervals one by one, in the 
order in which they appear in the sequence. The goal is to use as few colors as 
possible under the constraint that any two overlapping intervals are assigned 
with different colors. 

The competitive ratio of an online algorithm for the interval graph color- 
ing problem is given by the maximum over all the input sequences of the ratio 
between the expected number of colors used by the algorithm and the chro- 
matic number of the interval graph, i.e. the maximum number of intervals u) 
overlapping at the same point of the line. 

A lower bound for randomized algorithms against an oblivious adversary is 
established using the application of Yao’s Lemma [Yao77] to online algorithms 
[BEY98,BFL96]. A lower bound over the competitive ratio of randomized al- 
gorithms is obtained proving a lower bound on the competitive ratio of deter- 
ministic online algorithms for a specific probability distribution over the input 
sequences for the problem. 

We first give some notation. We will denote by P,^ the specific probability 
distribution over input sequences with chromatic number u> we use to prove the 
lower bound. Probability distribution will be described by a set of input 
sequences with chromatic number w, with every input sequence presented with 
equal probability. 

We denote hj a £ P the generic input sequence of probability distribution 
P. Input instance a is formed by a sequence of intervals {A, ...., 7|cr|}- 

Probability distributions P and Q are said independent if for any P C & 
P, P C P E Q, P and P are disjoint intervals. The set of sequences of 
probability distribution P U Q is obtained by the concatenation of every input 
sequence of P with every input sequence of Q. 

2.1 The probability distribution 

The probability distribution P^^ used for proving the lower bound is defined 
recursively. We will resort to a pictorial help to describe the sequence. 

Probability distribution Pi is formed by a single input sequence containing 
a single interval. P^ is the union of A independent probability distributions 
p\j U .... U P^, as described in Figure 1. The value of A will be fixed later. 

P^ is obtained from four independent distributions P^_i, P^_i, 

The set of input sequences of P^ is obtained by the concatenation of every input 
sequence of Pj;_i U Pj_i U P^_i U P^_i with every of 10 distinct subsequences 
Ti,....,Tio, called configurations, of at most 4 intervals as described in Figure 2. 
The intervals of every of the 10 different configurations are numbered in Figure 
2 following the order in which they appear in the sequence. Fvery probability 
distribution Pi^_i is generated applying the present definition for oj — 1. 

Observe that every input sequence of Pi^ has chromatic number u>. This can 
easily be seen with an inductive argument. Probability distribution Pi contains 
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a single sequence with chromatic number 1. By induction every input sequence 
from i — 1, 4, has chromatic number u) — 1. Every input sequence a 6 

UPj_i UP(2_i UP(^_i has also chromatic number u) — l. One can check from 
Figure 2 that the concatenation of a with every of the 10 configurations increases 
the chromatic number by 1. Since P^; is the union of A independent probability 
distributions P^, ....,P^, every sequence of P(^ has chromatic number u). 




□ □□□ 






Figure 1. The definition of the probability distribution. 



2.2 The proof of the lower bound 

The proof of the lower bound is based on the following lemma: 

Lemma 1. Any deterministic online algorithm uses at least Suj — 2 colors with 
probability at least 1 — on an input sequence drawn from probability distri- 
bution Pui, if • 

From the above lemma, choosing constant c large enough, say c = ln3w^, 
we obtain the following Theorem: 

Theorem 1. For any randomized algorithm for online interval graph coloring, 
there exists a input sequence of chromatic number lu where the expected number 
of colors used by the algorithm is at least 3w — 2 — o(l/w). 

The remaining part of this section is then devoted to the proof of Lemma 1. 
The proof is by induction. 

The claim of Lemma 1 holds for a; = 1, since a deterministic algorithm uses 
one color for the sequence from probability distribution Pi containing one single 
interval. Assume the claim holds for a probability distribution i.e. with 

probability at least 1 — e~^, the deterministic algorithm uses 3(w — 1) — 2 colors 
for an input sequence drawn from a probability distribution Pc,,-i. 

As an intermediate step of the proof we will prove a claim that holds for any 
probability distribution P^, j — 1, .., A. We denote in the following by P,j the 
generic probability distribution P^ . 
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Figure 2. The 10 configurations used to form the probability distribution. 



Lemma 2. Consider a probability distribution P^- Assume the deterministic 
algorithm uses at least — 1) — 2 colors for every input sequence a* drawn 
from probability distribution i = 1, With probability at least 1/10, 

the deterministic algorithm uses at least 3a; — 2 colors for an input sequence 
drawn from probability distribution P^ . 

Proof. Denote by C® the set of colors used for a generic input sequence cr® 
drawn from probability distribution P/,_i, and C/,-i = U|_]^C®. Let be the 
set of colors used for an input sequence from probability distribution We 
will prove that \C^\ > 3a; — 2 with probability at least 1/10. 

We distinguish four cases on the basis of value c = | (7®| — (3(o> — 1) —2), 

the number of colors exceeding 3 (a — 1) — 2 used by the algorithm for the four 
sequences cr®, z = 1, ..., 4. We will separately consider the cases of c = 0, 1, 2 and 
c>3. 
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c— 0 . In this case the deterministic algorithm uses the same set of colors 
for every sequence a*, i — 1, ..,4. The new intervals presented in any of 
the 10 configurations must be assigned with colors not in With 

probability 2/5, one of configurations Ti,T2,Ts or T 4 is presented. In all 
these configurations, one among intervals Ii and I2 that contains all the 
intervals of the other all the intervals of We furtherly distinguish 
two cases: a.) Intervals and P have assigned the same color, say Ci; b.) 
Intervals and P have assigned different colors, say ci and C 2 . 

a. With probability 1/2, the sequence is completed by intervals P and 
P of configuration Ti or T 2 . Interval P must be assigned with a color 
different from Ci, say C 2 , since it is overlapping interval P. Interval P 
must be assigned with a color different from Ci and C 2 , say C 3 , since 
it is overlapping intervals I4 and p. Then, with probability 1/5, 3 
more colors are used, and the claim is proved. 

b. In this case, with probability 1/2, the sequence is completed by interval 

P of configuration T 3 or T 4 . Interval P is assigned with a color 
different from Ci and C 2 , say C 3 , since it overlaps with both intervals 
P and P. Also in this case, with probability 1/5, 3 more colors are 
used, thus proving the claim. 

c=l. We prove that with probability at least 1/10, 2 colors not in C^j-i are 
used by the deterministic algorithm. The difficulty of this case is given to 
the fact that a sequence cr® may use only a subset of colors of A 

color of not used for an interval of a sequence < 7 ® may be “re-used” 
for an interval overlapping the intervals of sequence tr®. 

However, since any C® contains at least ICoi-il — 1 colors, we can make 
use of the following simple fact: 

Claim. For any two sequences i 7^ j, if C® 7^ then C® U = 

C^-i. 

The proof separately considers 4 different cases distinguished on the basis 
of the maximum cardinality s of a subset of formed by 

sequences assigned with same set of colors, that is every color is either 
assigned to an interval of all the sequences of the subset, or not assigned at 
all to an interval of any sequence of the subset. We have four different cases, 
for s — 1,2,3, 4. (Every subcase also includes its symmetric, obtained by 
replacing P with P and P with P.) 

s=l. In this case, for every two sequences have assigned a different set of 
colors. Then, we have uC^ = UC* = Cu-i- With probability 
1/10 configuration Tg is given. Interval p is assigned with a color 
Cl ^ , since for any color oiC^i-i, interval 7i overlaps an interval 
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assigned with that color. For the same reason, a color C2 ^ C^-i is 
assigned to l2- Color C2 must be distinct from Ci since interval I2 
intersects interval 7 i . The claim is then proved. 

2 . The case in which at most two sequences are assigned with same 
set of colors is broken in three subcases: A.) and tr^ or and cr^ 
are assigned with same set of colors; B.) Both and and and 
are assigned with same set of colors; C.) and cr^ are assigned 
with different set of colors, while and tr^ are assigned with same 
set of colors. 

A. ) Since s = 2 , we have and 7^ C"^. The same argument 

of s=l applies here to prove the claim. 

B. ) In this case we know that ^ = C^. With probability 

2 / 5 , corresponding to configurations Ti, T3, T5 and T7, interval 
I\ includes all the intervals of Assume I\ is assigned with a 
color Cl ^ Cui-i- With probability 1/4 configuration T7 is given. 
For every color of U = C/,-i, interval I2 of configuration 
T7 overlaps an interval assigned with that color. Since interval I2 
intersects Ii, it is assigned with a color C2 ^ C^-i distinct from 
Cl , then proving the claim. 

Otherwise, consider the case in which interval Ii of configura- 
tion Ti, T3, T5 or T7 is assigned with a color of Cuj-i, say cj. 
With probability 1 / 2 , corresponding to configurations Ti and T3, 
interval I2 includes sequence a^. Consider the case in which I2 
is assigned with a color ci ^ With probability 1 / 2 , corre- 

sponding to the selection of configuration T3, interval Iz overlaps 
all the intervals of and For every color of ( 7 ^ = Cu-i, 

interval Iz overlaps an interval assigned with that color. Iz also 
intersects interval I2 assigned with a color Ci ^ Interval Iz 

is then assigned with a color C2 ^ Co)-i thus proving the claim. 
We are left to consider the case of I\ and I2 both assigned with a 
color of say cf and d^- With probability 1 / 2 , corresponding 
to configuration T\, interval Iz includes sequence and inter- 
val Ii includes sequence . Since and we 

have ( 7 ^ U cj = and UC2 = Cuj-i. Interval Iz that over- 
laps (7^ and intersects interval I2, must be assigned with a color 
Cl ^ ( 7 tj_i. Interval I4, that includes sequence cr^, and intersects 
interval Ii and Iz, must be assigned with a color C2 ^ C/j-i, 
distinct from Ci. The claim is then proved. 

C. ) In this case 7^ ( 7 ^ and = ( 7 ^. Since s — 2 , ^ and 

(j2 ^ (^3 probability 2 / 5 , corresponding to configurations 

Ti, Tz, T5 or T7, interval Ii that includes sequence is pre- 
sented. Consider the case in which interval Ii is assigned with a 
color Cl ^ With probability 1 / 4 , corresponding to config- 
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uration Tj, interval I2, that includes and is presented. 

For any color of I2 includes an interval assigned with that 

color, and intersects interval 7i assigned with color Ci . I2 is thus 
assigned with a color C 2 ^ different from C\, thus proving 

the claim. 

We finally consider the case of I\ assigned with a color Cj 6 Cu-i- 
With probability 1/4 configuration T 5 is presented. Interval I2 
overlaps sequences cr^ and tr^. Since U I2 is as- 

signed with a new color, say ci. Since = C"^, C^Uc^ = 

For every color of C^, interval I3 overlaps an interval of as- 
signed with that color. I2 also intersects 7i assigned with color 
Cj and I2 assigned with color c\. Interval I3 is then assigned with 
a color C 2 ^ Coj-i, thus proving the claim. 

s=3 IfC^ = < 74 , we have that either or C‘^ = C^, but 

Under these assumptions, the same argument used in case C.) of s=2 
allows to prove the claim. 

s=4 In case = C^, all the sequences are assigned with 

same set of colors. The same analysis of case c = 0 allows to prove 
the claim. 

c=2. In this case = 3w — 3. To prove the claim, a new color must be 

used with probability at least 1/10. With probability 1/10, configuration 
Tio is presented. For any color of Cijj-i, interval 7i overlaps an interval 
assigned with that color. 7i is thus assigned with a new color ci ^ Cu-i- 

c>3. In this case |C/)-i| > 3w — 2, the claim is then proved. 



We finally present the proof of Lemma 1. Let p{w) be the probability that a 
deterministic algorithm uses 3u> — 2 colors on an input sequence from probability 
distribution Pu- Consider a probability distribution 7^ formed by probability 
distributions P^-i, i — By induction, we assume that the algorithm uses 

at least 3{u> — 1) — 2 colors with probability p{co — 1) > (1 — e~^) on an input 
sequence a* drawn from probability distribution P^_i- 

With probability p(w — l)^, a deterministic algorithm uses at least 3(w— 1)— 2 
colors for all the input sequences a* drawn from probability distributions 
i = 1, ..,4. With probability p{oj — 1)“^ we are then under the assumptions of 
Lemma 2. We obtain from Lemma 2 that with probability at least ^p{oJ — 1)^, 

the algorithm uses at least 3lo — 2 colors for an input sequence from 7^ , j = 
1,..,A. 

Since all the that form P„ are mutually independent, the probability that 
a deterministic algorithm uses less than 3oj — 2 colors on all the input sequences 
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drawn from probability distributions is then upper bounded by 




10 



p{uj - 1)“^ 



< 




10 



(1 



o— c\4 



e 



-A 



10 



If ^ > (i-e-c) 4 ; the given expression has value less than e 

We have then proved that with probability at least (1 — a deterministic 
algorithm uses at least 3u)—2 colors for a sequence from a probability distribution 
PIj, thus implying the claim of Lemma 1. 



3 A lower bound for online path coloring on 
trees 

We prove that any randomized algorithm for online path coloring on trees of 
diameter A = O(logn) has competitive ratio l7(log4l). 

We establish the lower bound using Yao’s Lemma [Yao77]. We prove a lower 
bound on the competitive ratio of any deterministic algorithm for a given prob- 
ability distribution on the input sequences for the problem. 

The tree network we use for generating the input sequence is a complete 
binary tree of L > 4 levels. The root of the tree is at level 0, the leaves of the 
tree are at level L — 1 . The 2' vertices of level I are denoted by r j , j = 0, . . . , 2^ — 1 . 
The direct ancestor of vertex u is denoted by p{u). We will indicate by [a, b] the 
path in the tree from vertex a to vertex b. 

The input sequence for the lower bound is generated in p = f?(logL) stages. 
We will prove that at stage i = 0..,p, with high probability, the number of colors 
used by deterministic algorithm is i. An optimal algorithm is shown to be able 
to color all the paths of the sequence with only 2 colors, thus proving the lower 
bound. 

At stage i of the input sequence, we concentrate on a specific level li = 
Zj_i — [(3® log -I- Z log p -I- log 4 log n)] of the tree, with Zq = L — 1. It turns 

out that at least 4p*logn vertices of level Zj_i are contained in the 

subtree rooted at every vertex of level Z*. To simplify notation, the jth vertex 
of level Zj, is denoted by rj. 

We define at stage i a set of pairs Xj = {(uj,Vj),j — 0, ...,2*’ — 1}, where 
Up Vj are two leaves of the subtree rooted at vertex rt of level Zj. 

Set of pairs Xq = {{rp r^),j = 1, ..., 2^ — 1} is composed by one degenerated 
pair for every leaf of the tree. 

The set of pairs at stage i is formed selecting at random for every vertex rj , 
two pairs of stage Z — 1 in the subtree rooted at rj. The pair associated with 
vertex rj is formed by the two second vertices of the two selected pairs. More 
formally; 
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1. For every vertex rj of level Zj, j = 0, 2^* — 1: 

Select uniformly at random two vertices of level Zj_i in the 

^3 ^3 

subtree rooted at vertex r*-. Let u!7^) S Ii~i be 

K' Kj Kj Kj 

the two pairs associated with vertices ^ and r* 2 ^ • 

2. Xi = {(u*7^, : i = 0, ..., 2h — 1} is the set of pairs at stage i. 

The input sequence at stage i is formed for every pair rj of Xj, by a path 
from the first vertex of the pair to the direct ancestor of vertex r j : 

Vi = {[w*,p(r*)] : j = 1, ...,2'» - 1}. 

We prove in the following that any optimal algorithm serves the input se- 
quence with two colors. We first observe: 

Lemma 3. Every edge of the tree is included in at most two paths o/Uj>oXj. 

Proof. For every vertex r], for every stage i, denote by Ej the set of edges in 
the subtree rooted at r® with endpoints between level Zj_i — 1 and level /*, plus 
the edge (rt,p(r])). (For a leaf vertex r*l, E'j includes the only edge (r^,p(r^))). 
Since all the paths of the input sequence are directed from a leaf vertex to one 
of its ancestors in the tree, it is sufficient to prove separately for every rj that 
every edge of X® is included in at most 2 paths of the input sequence. 

Edges of Ej are not included in any path Vv, i' < i. Every leaf vertex is the 
endpoint of at most one path of the sequence. By the construction of the input 
sequence, vertices u* and Vj are the only leaf vertices in the subtree rooted at 
r® that may be endpoints of paths in a set Xj/ ,i’>i. The claim is then proved. 
■ 

The following lemma bounds the size of the optimal solution. 

Lemma 4. The optimal number of colors for any input instance from the prob- 
ability distribution is 2. 

Proof. We prove the claim for any input instance on a binary tree with: (i.) 
Every path of the input instance directed from a leaf to an ancestor of the leaf; 
(ii.) Every edge of the tree included in at most two paths. The claim is proved 
showing a coloring of all the paths of the input sequence that uses only two 
colors. We proceed from the top to the bottom of the tree. Consider an internal 
vertex v (initially the root), and let Vi and V 2 be the two children of v. Consider 
edge (ui,u). (A similar argument holds for edge (v 2 ,v)). 

If no path of the input sequence includes both (t>i,u) and (v,p(v)) (assume 
this is the case if v is the root), edge (ui,u) is crossed by at most 2 paths, say 
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Pi and p2, that end at v. Paths p\ and p2 are assigned with the two available 
colors. If only one path, say pi, includes both (viitt) and {v,p{v)), there is at 
most one path, say p 2 , including edge (t>i, v) that ends at v. Path p 2 is assigned 
with the color not given to pi- If there are two paths, pi and p 2 , including both 
(t>i, v) and (v,p{v)), these have already received colors. The coloring procedure 
then moves to consider vertex V\. ■ 

In the reminder of the section we show that the expected number of colors 
used by any deterministic online algorithm is l7(logZ/), thus implying the lower 
bound. 

The following lemma will be used to prove our result. 

Lemma 5. For every pair (UpVj) e Xi, path [f],p(r])] intersects a single path 
in every set Vj, j <i. 

Proof. We prove the claim by induction on the number of stages. The claim 
is true for pairs of stage 0. Path [nj,p(rp] is formed by the union of paths 
Kl\p(ri, 2 )] and [p(r* 2 ),p(r*)]. 

j j j 

If the claim holds at stage i — 1, path intersects one single path 

for every Vj, j <i — l. 

Path [p(f^ 2 ),p(p})] includes only edges of level lower than — 1. Since 
no path of & set Vj, j < * — 1, includes edges of level lower than /j_i — 1, 
path [ut,p{r‘j)] can intersect only paths of level Vi- It certainly intersects path 
[ut,p{Vj)\ € Vi on edge (rj,p{rj)). This is also the single path of Vi in the 
subtree rooted at vertex r] , thus showing the claim. ■ 

We introduce some more notation. Given a pair (Uj,Vj) G Xj, let Gj = 
{ cq , ■■■■,Ci} be a set of i + 1 colors. Color Cj is defined to be the color assigned 
to the single path of Vj intersecting [r;j,p(rj)]. 

Pair (Uj,Vj) is a good pair if Gj is formed by z + 1 distinct colors. We denote 
by Pj the probability that pair r* is a good pair. We will prove that with high 
probability, for any stage i = 0,..,p, there exists at least one good pair of level 
i. The existence of a good pair of level i gives the evidence that at least z + 1 
colors have been used by the deterministic algorithm, thus proving the claim. 

The following claim gives a sufficient condition for a pair of level z to be a 
good pair. 



Lemma 6. 

two good pairs (zz^ 



Pair {up Vj) ^ Xi of level i is a good pair if obtained from selecting 

with G*7^ = Gr^ 



,v 






K 






Proof. For every color c G by Lemma 5, path [zz*,p(r])] intersects a 

single path assigned with color c. Path [zzt,p(r*)] is then assigned with a color 
Ci ^ every color c G path [f],p(z’])] intersects a path 




244 



S. Leonardi and Andrea Vitaletti 



assigned with color c. Path [n],p(r])] also intersects path [n},p(r])] assigned 
with color c* on edge (rj,p(rp). Pair (upVj) is then a good pair with set of 
colors U {cj}. ■ 

The following lemma bonnds the probability that a pair is a good pair; 



Lemma 7. For every pair r] of level k, pj > 




Proof. The proof is by induction on the number of stages. The claim is true 
for z = 0. Assume it is true for any pair of Zi-i . 

Pair (upVj) £ Xj of level i is obtained by selecting two good pairs zi = 



{u 



ft} 



\ •dfti ^), *2 = with colors Ci = ^ and C 2 = C] 



ft? 



By Lemma 6, the probability that (uj, nj) is a good pair is: 



P^> Pr[ ii is a good pair] x Pr[ Z 2 is a good pair] 

X Pr[ Cl = C 2 I Zi and 12 are good pairs]. 

The probabilities that ii and 12 are good pairs are denoted in the following 

by and By the inductive hypothesis, (~^) ' 

We are left to determine Pr[ Ci = C 2 \ ii and t 2 are good pairs]. 

The event “ Ci = C 2 | ii and 12 are good pairs” contains the event “ there 
exist two good pairs ii,i 2 , with Ci = C 2 in the subtree rooted at rj” fi “the 
two selected good pairs ii, Z 2 have Ci = C 2 ”. 

Since the deterministic algorithm uses at most p colors, there are at most 
(^pflyn < p* distinct possible set of colors for a good pair at stage i — 1. The 
probability that there are at least two good pairs with same set of color is then 
lower bounded by the probability that there are at least p* + 1 good pairs of 
level z — 1 in the subtree rooted at rj. Such bound is established by the following 
claim: 



Lemma 8. The probability that there are at least p* + 1 good pairs at stage 
i — lin the subtree rooted at a vertex r*- is at least 1 — 

Proof. To establish the claim, we use Chernoff’s bounds. We associate to 
every of at least 4p* logn pairs of level z — 1 in the subtree rooted at 

rt a {0, 1} random variable Xk- We indicate with Xk = 1 that pair 
is good pair, X^ = Q otherwise. Random variables X^ are independent. This 
follows from the following fact. Consider any two pairs of level z — 1 in the 
subtree rooted at rj, associated with vertices r\~^ and Vertices and 
r 2 “^ are the roots of two different subtrees, and all the paths presented until 
stage z — 1 with edges in the subtree rooted at r\ are colored independently from 
the paths presented until stage z — 1 with edges in the subtree rooted at r^. 
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Random variable Xj. has value 1 with probability ^ , value 0 with proba- 
bility 1-pr^. 

Under these conditions, we can use Chernoff’s bounds to estimate Pr[a; < 
p* -h 1]: Let x = Y,k^k, IJ- = E[x] = X;Li ^ ^ (O’ !]• 

Pi[x <{l-S)n]<e^-'^^"/^K 

A lower bound over the expected number of good pairs at level i — 1 in the 
subtree rooted at rj is 



E[x] > p = 





4p* logn = 4p* logu, 



obtained multiplying the number of pairs at level i — 1 in the subtree rooted 
at r] times a lower bound over the probability that a pair of level i — 1 is a 
good pair. The following expression is easily obtained from the expression of 
Chernoff’s bounds; 



Pr[a; < (1 - 5)p] < Pr[a; < (1 - S)fj] < 

Setting d = 1 — we obtain: 

PrU < p* -I- 1] < ea;p(— 2p*logn(l — — )^) < — . 

logn n 

Since L > 4, the claim follows from n > 15. ■ 

We have shown that with probability at least (1 — there exist two good 
pairs ii, 12 with colors Ci = C 2 = C. The two selected pairs of level i — 1 are 
chosen at random between all the pairs in the subtree rooted at r® . Since there 
are at most p* distinct possible set of colors for a good pair of level i — 1, the 
probability that two good pairs have assigned the same set of colors C is at least 

1 “ ” 

It then follows that Pr[ Ci = C 2 \ ii and ^2 arc good pairs] > 

The probability that ft is a good pair is then bounded by 



P) > Pi P 2 Cl = C 2 I ii and I 2 are good pairs] 



> 



2 3 ’ 



— ^ > 
r2z - 



2 3 ’ + 2 * 



> 



oi + 1 



The construction of the input sequence is repeated until stage i = p such 
that \Ij\ = ^ 1 4p®logn. Easy computation shows p = l7(logL). 
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Lemma 8 shows that under these assumptions with probability at least (1 — 
there exist more than pP good pairs of level lp-\. Then, with probability at least 
(1 — ^), a set Cp of p distinct colors is used by the deterministic algorithm. 

Since n > 2, the expected number of colors used by any deterministic algo- 
rithm is at least p/2. Lemma 4 states that the optimal solution uses two colors 
on any of these input sequences. Thus, the lower bound over the competitive 
ratio of randomized algorithms is given by p/4. We then conclude with the 
following theorem: 

Theorem 2. There exists a l7(logZi) lower bound on the competitive ratio 
of randomized algorithms for online path coloring on a tree of diameter A = 
O(logn). 

4 Conclusions 

In this paper we have presented the first randomized lower bounds for online 
interval graph coloring and online path coloring on tree networks. This line of re- 
search is aimed to establish if there exists a specific network topology where ran- 
domized online algorithms obtain substantially better competitive ratios than 
deterministic algorithms. 

A first open problem is to close the gap between the randomized lower bound 
for online path coloring on trees and the best deterministic upper bound known 
for the problem. 

The lower bound for path coloring on trees is actually obtained on a 2- 
colorable graph. This does not preclude the existence of an algorithm that uses 
X + 0(log A) colors for the problem. A second open problem, posed in [BEY98], 
is to establish a multiplicative lower bound rather than an additive lower bound, 
i.e. a lower bound on a graph of arbitrary large chromatic number. 
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Abstract. The Minimal Consistent Subset Selection (MCSS) problem is a 
discrete optimization problem whose resolution for large scale instances 
requires a prohibitive processing time. Prior algorithms addressing this problem 
are presented. Randomization and approximation techniques are suitable to face 
the problem, then random search and meta-heuristics are proposed and 
discussed. Specifically, Tabu Search emerges as a promising technique, 
consequently Tabu Search strategies are applied and evaluated. Parallel 
computing helps to reduce processing time and/or produce better results; 
different approaches for designing parallel tabu search are analyzed. 

1 Introduction 

Nearest Neighbor based decision systems used in pattern classification have the 
Nearest Prototype Classifier as the simplest and most widely used classifier. Let 
S={pj,...,pJ !Z R‘* be a labeled data set (the reference set), with each p, e S (called a 
prototype) labeled as one of the c classes (when t > c > 1). One-Nearest-Neighbor 
rule assigns any unlabeled object in Rd to the class of its nearest prototype, according 
to a specified metric in R"* (usually but not necessarily Euclidean metric) [1]. Despite 
its simplicity, practical use of NPC is limited by its high computational demands in 
the operational phase (when classifying unlabeled objects by using a reference set). 

In order to reduce the computational demands, the goal is to design a good 
prototype set of minimal cardinality that will ideally allow for the lowest possible 
error rate of the classifier. There are two options: selection, when we retain a subset 
(formed by S-prototypes) from the original reference set, and replacement, when 
replacing the original data set by a number of labeled prototypes (referred to as R- 
prototypes) that do not necessarily coincide with any original prototype [2]. Selection 
techniques that find subsets SS guaranteeing zero errors when used to classify the 
original reference set S are called condensation techniques, and the produced subset is 
said to be consistent with S. Computational efficiency of the operational phase is 
increased in a ratio given by cardinality(S) / cardinality(SS), so efforts spent in the 
condensation phase are well worth in the operational phase, when real-time 
constraints appear. 

The aim is to design a method to find a Minimal Consistent Subset. Several papers 
on the topic presented algorithms condensing or reducing the given reference set 
ensuring that the selected subset is consistent with the original data set, but none of 
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them realize the goal of minimal cardinality. Related algorithms tradeoff accuracy for 
prototype number reduction, i.e., try to select subsets of reduced cardinality (as low as 
one prototype per class) being their goal the lowest resubstitution error rate. The work 
reported here is focused on consistent subset selection only. Exact algorithms to 
ensure a minimal consistent subset (like Branch and Bound) requires exhaustive 
search and prohibitive processing time. 

From a different point of view, the Minimal Consistent Subset Selection (MCSS) 
problem can be considered as a discrete optimization problem (with binary variables 
set to 1 when a sample is selected and 0 otherwise), suitable for randomization 
algorithms and meta-heuristics, as they are genetic algorithms, simulated annealing 
and tabu search [3], that have shown to be successful in a wide range of problems, 
resulting in good-quality solutions in reasonable times. Specifically, tabu search 
techniques applied to MCSS problem solving are implemented, analyzed and 
compared with results produced by other techniques. 

Finally, parallel computing offers the advantage of reducing the execution time and 
its use can also improve the quality of the final solution. In the literature various 
approaches have been suggested for designing parallel search [4]; some of them are 
implemented and evaluated in our research. 

The remainder of this paper is organized as follows. Prior developments in 
condensation techniques are presented in Section 2. Randomization and meta- 
heuristics applied to the MCSS problem are discussed in Section 3, followed by 
description of Tabu Search strategies and options for the MCSS problem solving in 
Section 4. Different approaches incorporating parallelism into the search are also 
discussed in Section 5. Experimental results are given in Section 6, and the last 
section presents some concluding comments. 

2 Selection Algorithms (Condensation Techniques) 

2.1 Hart's Algorithm 

One of the earliest algorithms for prototype selection was presented as the 
"Condensed Nearest Neighbor Rule" [5] by Hart (named Hart’s Algorithm in this 
paper). The testing process in the algorithm ensures that the subset is indeed 
consistent, but as admitted by the author the goal of minimal subset is not realized. 
The procedure ends up with a relatively large consistent subset, being very sensitive 
to the randomly picked initial selection and to the order of consideration of the data. 

Hart's elegant method has been used as a basis for many subsequent modifications 
that, unlike the original procedure, permitted both addition and deletion of samples to 
and from the condensed subset. For instance. Gates [6] presented an algorithm where 
the reduced set is derived by iteratively contracting the given set, provided for the 
possibility of reinsertion of dropped samples until stability is reached. However, this 
and other proposals, while obtaining smaller subset than Hart's algorithm at a higher 
computational cost, do not realize the goal of minimality. 

Considering the previous reasons we will retain Hart's algorithm as the fastest 
method to produce consistent subsets, and we will use it later at several points in this 
paper. 
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2.2 Dasarathy's Algorithm 

In 1994 (Hart presented his method in famous 1968) a study of Dasarathy presented 
an algorithm [7] (named Dasarathy's Algorithm in this paper) for selecting an optimal 
consistent subset based on his Nearest Unlike Neighbor concept [1], His approach 
claimed for benefits as compared to prior approaches, including that the derived 
subset is aimed to be minimal in size. 

Dasarathy's results are basically independent of the order of presentation of the 
samples and hence the selected consistent subset is unique in terms of the number of 
samples (the author stated that the result is unique in terms of the selected samples but 
this is not absolutely true since ties may appear when identifying the most voted 
sample, and the selected sample is not determined). Consistent property is guaranteed 
at each iteration, then one could halt the process at a desired iteration (even before the 
second one); by fact, most of the reduction is achieved at the first. 

By comparing its results with those obtained with other approaches, the author 
believed his method realizes the minimality goal in consistent subset selection, 
without formal mathematical analysis. However counterexamples to Dasarathy's 
conjecture may be provided as Kuncheva and Bezdek did in [2], when presented a 12- 
element consistent subset for the popular IRIS data set (Dasarathy's algorithm finds a 
15-element for IRIS). Our research also found a smaller subset (even a 11-element 
one) than Dasarathy's technique. 

Notwithstanding, Dasarathy's algorithm is the best known algorithm in terms of 
consistent subset size and one of its main features is that samples are selected due to 
its representative nature, then resulting in a negligible loss in recognition efficiency 
when the full training set is replaced by the selected consistent subset in the 
operational phase of testing an independent test data set. Therefore we will designate 
Dasarathy's algorithm as the best classical algorithm for the MCSS problem for 
further considerations. 

3 Randomization and Meta-Heuristics for MCSS 

Instead of designing new algorithms or modifications of the previous ones, we can 
envisage our problem as the exploration of a solution space consisting of all possible 
subsets of a original set, searching for a consistent subset as reduced as possible. Then 
we may face the problem with random search or with guided search in the form of 
meta-heuristics. We analyze the pros and cons of the different options and implement 
some of them. 

3.1 Random Restart Procedure 

Random exploration of the space of subsets will find lot of subsets revealed 
inconsistent after evaluation, wasting many efforts in unfeasible solutions. If we 
constraint the solution space to consistent subsets only, we need some procedure to 
randomly generate a consistent subset. 

Fortunately, there is a simple method to randomly generate consistent subsets: 
Hart's algorithm described in section 2.1 that with a limited effort produces relatively 
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reduced subsets. Randomness in Hart's algorithm resides in the initial picked subset 
and in the order of consideration of samples. 

We can use a random restart procedure by using Hart's algorithm. At each restart, 
we randomly pick an initial subset and we generate a random permutation of data, 
then proceeding until consistency is attained. After each iteration, we retain the 
current best solution. We can iterate as many times as desired or iterations may stop 
when a time deadline expires. 

Due to the incremental nature of Hart's algorithm (only sample additions are 
considered, then subset size monotonically increases until consistency is ensured) we 
can implement a kind of bounding procedure. Any iteration can be halted when the 
subset size equals the size of the current best solution (no hope to improve it), then 
restarting a new iteration so saving computational efforts. 

Contrary to expectations and despite its naive appearance, this Restarting Hart's 
algorithm incorporating bounding procedure competes surprisingly well as we will 
show in the Experimental Results section. 

3.2 Meta-Heuristics 

We characterize our Minimal Consistent Subset Selection problem as that of 
optimizing (here minimizing) an objective function f(x) subject to x 6 X (the solution 
space). As we will discuss below, X may be constrained to consistent subsets then the 
objective function is simply the subset size, or X may consists of all possible subsets 
then f(x) is a linear or nonlinear function of size of subset x and number of 
misclassified prototypes (number of resubstitution errors when the original data set is 
submitted to subset x as Nearest Prototype Classifier). 

In classical heuristic procedures, each x e X has an associated Neighborhood N(x) 
and each x' e N(x) is reached from x by an operation called a Move. Applying pure 
Local Search concept to MCSS problem, possible moves are sample addition and 
sample deletion (set or clear binary variables reflecting i is present in the selected 
subset or not). Descent methods or Steepest Descent methods proceed iteratively from 
a initial solution to another (a better evaluated neighbor or the best evaluated neighbor 
in Steepest Descent) until no solutions immediately accessible improve the last one 
found. 

The meta-heuristic term coined by Glover in 1986, refers to a master strategy that 
guides and modifies other heuristics to produce solutions beyond those that are 
normally generated in a quest for local optimality. The heuristics guided by a meta- 
strategy may be high level procedures or may embody nothing more than a 
description of available moves for transforming one solution into another, together 
with an associated evaluation rule [3]. 

The emphasis on guidance distinguish meta-heuristics, based on a variety of 
interpretations of the "intelligent search" concept, resulting in different meta- 
heuristics. Simulated Annealing (SA) [8] imitate a physical process in metallurgy. 
Genetic Algorithms (GA) [9] are based on the biological phenomenon of evolutionary 
reproduction (GA are also referred as Evolutionary Computation). We paraphrase an 
arguable comment from Glover and Laguna claiming that the trend to associate 
methods with natural processes "embodies a wave of New Romanticism that [...] 
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suggest that by mimicking the rules we imagine to operate in nature we will similarly 
be able to produce remarkable outcomes". 

The use of Genetic Algorithms for solving the Minimal Consistent Subset 
Selection problem and other related Prototype Selection problems is presented in a 
recent study by Kuncheva and Bezdek [2] indicating the capacity of the technique and 
good-quality results, but no reference to computational complexity or time spent (in 
their Experimental Result section they only stated 500 iterations arc executed). 

Tabu Search is an "artificial" meta-heuristic based on selected concepts that unite 
the fields of artificial intelligence and optimization. The method is based on 
procedures designed to cross boundaries of feasibility or local optimality. One of its 
main components is the use of adaptive memory, which create a more flexible search 
behavior [3][10]. 

The kind of neighborhood exploration and the use of short-term and long-term 
memory distinguish Tabu Search from Genetic Algorithms and Simulated Annealing, 
resulting in lower computational cost and better space exploration for the MCSS 
problem, then we will describe Tabu Search use in next section. 

4 Tabu Search for the Consistent Subset Selection 

In our implementation, let X be the solution space consisting of all possible subsets of 
the original reference set (including both consistent and inconsistent subsets), and let 
the objective function to minimize f(x) for all x e X be 

f(x) = cardinality(x) -i- K * errors(x) . (1) 

where errors(x) denotes the number of resubstitution errors resulting from x use as 
NPC. The second term is a penalty term weighted by a constant K e R* whose value 
will be determined in practice. 

Neighborhood definition for each subset x e X is N(x) c X consisting of all 
subsets that differ from x in only one sample addition or deletion. Then the Move 
definition is that of adding or deleting a sample to or from the current subset x. 

The term Tabu Search involves a lot of techniques and strategies, but it mainly 
comes from the use of short-term memories {tabu lists) that keep track of recently 
examined solutions intending to avoid cycling in the space exploration. After a move 
(addition or deletion) is performed, the move is declared tabu for a predetermined 
number of moves, i.e. this move cannot be reversed until a tabu tenure expires. This 
means that TS is a dynamic neighborhood method, where neighborhood of x can 
change according to the history of the search (this situation is referred as reduced 
neighborhood). However, a tabu move is admissible if compliant with an aspiration 
criterion, usually that of improving the best current solution. 

At each step, we select the least weight non-tabu move from those available (may 
be an ascending move in some situations of the search), and use the improved-best 
aspiration criterion to allow a move to be considered admissible in spite of its tabu 
status. Tabu Search saves best current solution at any time and proceeds iteratively 
until a chosen termination criterion is satisfied (usually when best solution wasn't 
improved in M iterations). 
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In a second level approach, Tabu Search includes additional mechanisms based on 
long-term memories to direct the search into a promising region (intensification) or 
toward previously unexplored regions of the solution space (diversification). 

Main features of this Tabu Search implementation for the MCSS problem are 
presented in the following sections. 

4.1 Neighborhood Exploration in TS Compared to GA 

Evaluation of a new subset x e X involves a quadratic cost since for each sample it 
has to be determined its nearest neighbor present in x by examining distance to all 
selected prototypes. In contrast to this quadratic cost evaluation for methods relying 
on random or discontinuous exploration as Genetic Algorithms, evaluation time is 
reduced in Tabu Search due to TS systematic neighborhood search. 

In our implementation for the MCSS problem, a sample addition move is just 
evaluated by testing whether the newly inserted prototype is closer than the previous 
nearest neighbor for each class, resulting in linear time cost. A sample deletion move 
just involves nearest neighbor decision of those samples whose previous nearest 
neighbor was the deleted one, better than quadratic time. Then neighborhood 
exploration in TS is more efficient than in GA due to its pure Local Search nature. 

4.2 Initial Solutions and Constructive Methods 

Tabu Search may start from any initial solution. A first option is to operate on a fully 
constructed solution (here a consistent subset) produced with other technique (as 
Hart's algorithm) then guiding transition moves to optimize the condensed subset. A 
second option is to start from the obvious solution, the full original reference set 
(consistent with itself) then proceed condensing the selected subset. 

The third option considers constructive moves for generating initial solutions, 
being these constructive moves subjected to Tabu Search guidance. This option has 
significant consequences for the range of strategies available to the meta-heuristic 
approach, and as we checked in the experimental test, drives to better solutions than 
former options. 

4.3 Constraining to Feasible Regions 

In our implementation, solution space exploration makes no distinction between 
consistent and inconsistent solutions except for the penalty term weighted by the 
constant k. Consistent solutions form several disconnected regions. If constant k has a 
high value (say 10) search tends to remain in a local region without "stepping" on 
inconsistent solutions while crossing to different regions of consistent solutions. A 
lower value of k (to be 1 ) allows this boundary crossing, relying on tbe penalty term 
to drive the search towards consistent solutions (experimental tests demonstrate that 
this is the case). 

An alternative approach is to constrain neighborhood to moves only among 
consistent subsets (feasible solutions). To encompass infeasible solutions, the search 
may be strategically driven to cross the feasibility boundary by deleting samples 
whose deletion produce inconsistency. After a selected depth is reached (certain 
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number of samples are dropped) the search changes direction by sample addition 
driving back toward a feasibility, a consistent solution. A planned use of this called 
Strategic Oscillation will allow to visit the different disconnected feasible regions. 

4.4 Intensification and Diversification 

Beyond first level Tabu Search approach, use of longer-term memory make possible 
to better explore promising solutions or regions (intensification phase) or to explore 
less explored regions (diversification phase). Two intensification procedures are here 
proposed. The simplest one is to maintain a list of best solutions then starting first 
level TS from one of these solutions after clearing all tabu lists that were reducing 
neighborhood when this solution was found. 

Another intensification procedure for the MCSS is to combine best solutions then 
resulting in a subset containing the most selected prototypes (even reducing to those 
prototypes present in all best solutions). If the resulting subset is inconsistent then 
proceeds with constructive moves followed by the local TS phase. 

The diversification phase uses a memory containing information relative to visited 
solutions since the beginning of the search. As in [14] we use an array V representing 
the number of iterations where sample i is selected. In order to generate a diversified 
solution, only samples of which V[i] has a value less than a threshold are included in 
the diversified solution. 

4.5 Additional TS Options 

Tabu Search comprises a lot of techniques and strategies adaptable to the MCSS 
problem, as they are Asymmetrical Tabu Tenures (a longer tabu tenure for sample 
addition than for sample deletion, provided that while optimizing there are much more 
samples to add than to delete). Candidate List Strategy for narrowing the examination 
of elements of a neighborhood in order to achieve an effective tradeoff between the 
quality of the move and the effort expended in it, or One Sided Strategic O.scillation 
to remain predominantly on the feasible region. Use of these and other TS techniques 
are beyond the scope of this study and will explored in the near future. 

5 Parallelization Strategies 

High-performance computing potential offered by parallel computers suggests its use 
to solve optimization problems by computationally intensive exact algorithms like 
Branch and Bound [11]. However, solving problems of large dimensions requires a 
great amount of time even in the presence of efficient parallelization and a high 
number of processing elements. 

Extensive literature is available on parallel search algorithms for discrete 
optimization techniques [12]. Here we will just analyze parallelization of the 
randomization and meta-heuristics presented in this paper. Different sources of 
parallelism exist in Tabu Search algorithm. Four of these sources are: 

1 . parallelism in cost function evaluation 

2. parallelism in problem decomposition 
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3 . parallelism in neighborhood examination 

4. parallelism in solution domain exploration by different seareh paths 

The first souree represent a low level approaeh and the seeond one is not applieable 
to the MCSS problem. The last two sources of parallelism will be studied in detail. 

5.1 Parallelizing Restarting Hart's Algorithm 

Parallelization of the random restart procedure presented in section 3.1 is 
straightforward and suitable for distributed architectures and asynchronous schemes 
due to its low communication requirements. Initially all processing elements (PE) set 
a local variable best_value to the original set size. Each processing element just 
proceeds with the restarting Hart's algorithm bounded by this best_value. If an 
iteration results in a solution smaller than the best_value, the PE saves this solution in 
local memory and broadcasts its size to all other PEs. After each iteration (improving 
or not) a non-blocking read is performed to listen to improvements if any, updating 
best_value to the current lowest value. After a global termination criterion is satisfied 
(by means of a centralized control) best solution is transferred from the proper local 
memory. 

5.2 Single-Walk Tabu Search 

Parallelism in neighborhood examination is also called single-walk search. Only a 
single walk in the solution space is carried out. Since the search for best move at each 
iteration is a computationally intensive task, moves to be evaluated are distributed 
over a number of processors. In our parallel TS implementation for the MCSS 
problem, each processor will he responsible of evaluating state reversal of a group of 
prototypes. All processors work on the same current solution. A master processor 
receives best evaluated move from each processor and selects the best one, then 
communicates the move to slave processors that perform it locally. TS variables and 
tabu lists remain local to each processor. 

This simple parallelization of sequential tabu search produced good performance in 
optimization problems as Task Scheduling under Precedence Constraints [13]. Its 
benefits for the problem at issue should be assessed. 

5.3 Multiple-Walk Tabu Search 

When using parallelism in solution domain exploration, different parallel processes, 
called parallel search threads, are created and distributed over the available PEs. 
Each search thread consists in executing a TS algorithm from an initial solution (may 
be equal or different at each thread) and using a set of local parameters. This set of 
parameters determines the TS behavior specifying a Strategy (again may be unique or 
multiple). 

The simplest approach is to perform multiple independent walks. In a better 
coordinated Job, parallel search threads have the possibility to exchange information, 
then called interacting walks. In this case it should be decided the nature of 
information to be exchanged, such as the occurrence of improved solutions or the 
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availability of promising paths, and the way to use the additional information 
resulting from interaction of walks. Interesting results of this approach for the 
Multidimensional Knapsack Problem are presented in [14] 

6 Experimental Results 

We have implemented the following techniques in C language: 

• Dasarathy's algorithm described in section 2.2. 

• Restarting Hart's algorithm as described in section 3.1. 

• Tabu Search for the MCSS problem as described in section 4. 

For comparison purposes, three different termination criteria are implemented for 
restarting Hart's and Tabu Search: termination when a given quality solution is 
reached, termination by given time deadline or termination after a given number of 
moves without improvement (standard termination criterion). 

6.1 Tabu Search Capacities 

Just by using a first level Tabu Search approach we have shown Tabu Search 
capacities. In our first experiment, we used the popular IRIS data set comprising 150 
labeled samples in R"*, 50 each from three physically labeled subspecies of IRIS 
flowers. Dasarathy's technique finds a 15-element consistent subset. 

The most reduced known subset for the IRIS set is a 1 1-element one obtained by 
Tabu Search with K=1 (the constant in the penalty term of the objective function), 
tabu tenure of 15 moves and a stopping criterion of 100 moves without improving the 
best solution (by fact best solution is found in the 279th move), starting from a 
randomly picked subset (one sample per class) and constructive moves in tabu search 
style. This solution improves best known result in the literature that was a 12-element 
subset [2]. 

6.2 Parallelization Results 

For time considerations and parallelization benefits assessment we use a larger data 
set in further experiments, consisting of 500 samples from two classes in R2 (sample 
dimension is not relevant for the condensation phase since we may compute all 
distances in advance). 

We have implemented the following parallel versions 

• Parallel Restarting Hart's algorithm as described in section 5.1. 

• Single-Walk Tabu Search with parallel neighborhood evaluation (see section 5.2.). 

Multiple-Walk Tabu Search is currently under development. For the parallel 
versions we developed a program in C language and used a message passing 
programming model by using the PVM library. The parallel architecture used during 
tests is the SGI Origin 2000 with 64 processors RIOOOO nodes. Applications were 
compiled with native SGI PVM library. To generalize the results, time are expressed 
in relative units. 
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^ RESTARTING HART 
^TABU SEARCH 



Fig. 1. Time comparison for same or better quality solutions than Dasarathy's algorithm for the 
500 prototype test set 




— DASARATHY 
YY restarting HART 
^TABU SEARCH 



Fig. 2. Quality solution comparison for limited time (Dasarathy's time) for test data set 
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Our second experiment uses termination by given quality solution, stopping with 
Dasarathy's result is improved. Figure 1 shows that our Restarting Hart's algorithm 
comfortably competes with elaborated Dasarathy's procedure, overtaking it with just 
two processors. 

Our third experiment uses termination by deadline expiration, given execution time 
of sequential Dasarathy's algorithm. Figure 2 shows that Restarting Hart's quickly get 
better results, but additional computational efforts drive Tabu Search to the best 
solutions. 

Finally, table 1 shows quality of best solution from each technique in longer time, 
proceeding until termination criterion is satisfied, chosen as a reasonable number of 
iterations without improvement. Computational effort is here measured in user time 
for a sequential run in a single processor. 

Table 1. Best solution from each technique, for a 500 prototype test data set 



Used technique 


Subset Size 


Computational effort 
(in relative time) 


Dasarathy's 


38 


1.00 


Restarting Hart's 


26 


22.17 


Tabu Search 


19 


31.49 



7 Discussion and Future Work 

Our results for the MCSS problem seem to improve those obtained by Genetic 
Algorithms presented in [2], when using the IRIS data set, resulting in 10 and 11- 
element subset with one resubstitution error and 12-element consistent subset, while 
the TS presented here resulted in a 1 1 -element consistent subset. 

Our experiments are simply illustrative because of the number of runs, being its 
purpose to show their capacity and not to analyze robustness, convergence, etc. The 
points are that a random restarting procedure as the presented Restarting Hart's 
algorithm easily get good-quality solutions for the MCSS problem and that meta- 
heuristic Tabu Search get better results than respectable algorithms (and than popular 
genetic algorithms) and they get them in reasonable times. 

While some MCSS techniques require specifying the number of prototypes in 
advance or they converge to a set whose cardinality cannot be specified or changed as 
desired. In Tabu Search (and in GA [2] too) the optimal number of prototypes is 
decided in the course of the computation. As we showed. Tabu Search has many 
options and degrees of freedom to emhed any kind of desired requirements. 

Last, parallel implementations allow both reducing execution time and obtaining 
better solutions. As future work, parallel cooperative (Multiple- Walk) Tabu Search 
for the MCSS should be implemented and analyzed in depth, and it should be 
compared to other approaches like parallel genetic algorithms [15] adapted for the 
MCSS problem. 
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Abstract. In this paper, we focus on the complexity analysis of three 
simulated annealing-based cooling schedules applied to the classical, gen- 
eral job shop scheduling problem. The first two cooling schedules are 
used in heuristics which employ a non-uniform neighborhood relation. 
The expected run-time can be estimated by for the first and 

for the second cooling schedule, where n is the number 
of tasks, m the number of machines and e represents 0(ln In n/ In n). The 
third cooling schedule utilizes a logarithmic decremental rule. The un- 
derlying neighborhood relation is non-reversible and therefore previous 
convergence results on logarithmic cooling schedules are not applicable. 
Let Imax denote the maximum number of consecutive transition steps 
which increase the value of the objective function.We prove a run-time 
bound of 0(log^^^ 1/(5) -I- to approach with probability 1—6 

the minimum value of the makespan. The theoretical analysis has been 
used to attack famous benchmark problems. We could improve five upper 
bounds for the large unsolved benchmark problems YNl, YN4, SWV12, 
SWV13 and SWV15. The maximum improvement has been achieved for 
SWV13 and shortens the gap between the lower and the former upper 
bound by about 57%. 



1 Introduction 

In the job shop scheduling problem n jobs have to be processed on m differ- 
ent machines. Each job consists of a sequence of tasks that have to be processed 
during an uninterrupted time period of a fixed length on a given machine. A 
schedule is an allocation of the tasks to time intervals on the machines and the 
aim is to find a schedule that minimizes the overall completion time which is 
called the makespan. This scheduling problem is NP-hard [7, 18] and there exist 
problem specifications which are even hard to approximate within a polyloga- 
rithmic distance to the optimum solution [23] . To find a schedule that is shorter 
than 5/4 times the optimum is also NP-hard for the general problem setting [22]. 
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In the present paper, we are concentrating on the complexity analysis of 
three simulated annealing-based cooling schedules applied to the general job 
shop problem. The first two cooling schedules are used in heuristics which em- 
ploy a detailed analysis of the objective function. For these heuristics we intro- 
duced a non-uniform neighborhood relation with biased generation probabilities 
of neighbors. The preference is given to transitions where a decrease of longest 
paths is most likely. The expected run-time can be estimated by 0(n^+'^) for the 
first and + for the second cooling schedule, where e represents 

0(ln In n/ In n). 

The third cooling schedule utilizes a logarithmic decremental rule. Together 
with a neighborhood relation introduced by Van Laarhoven et ah in [21] we ob- 
tain a stochastic algorithm with a provable convergence rate. The neighborhood 
relation determines a landscape of the objective function over the configuration 
space T of feasible solutions of a given job shop scheduling problem. The gen- 
eral framework of logarithmic cooling schedules has been studied intensely, e.g., 
by B. Hajek [8] and 0. Catoni [3,4]. To analyze the convergence rate they uti- 
lize specific symmetry properties of the configuration space with respect to the 
underlying neighborhood relation. Our chosen neighborhood relation does not 
provide these symmetry properties but nevertheless we could perform a conver- 
gence analysis of the corresponding stochastic algorithm. 

Let Sis{k) denote the probability to obtain the schedule S G T after k steps 
of a logarithmic cooling schedule. The non-reversible neighborhood from [21] a 
priori ensures that transitions always result in a feasible solution. Therefore, the 
problem is to find an upper bound for k such that Xlse.?’ ■ ^s(k) > 1 — <5 for 
schedules S minimizing the makespan. Our convergence result, i.e., the upper 
bound of the number of steps k, is based on a very detailed analysis of transition 
probabilities between neighboring elements of the configuration space JF. We 
obtain a run-time of 0(log^/^ 1/S) + to have with probability 1 — <5 a 

schedule with the minimum value of the makespan, where /max is a parameter 
for the energy landscape. The present approach has been briefly outlined in the 
context of equilibrium computations in specific physical systems [2]. 

The theoretical analysis has been used to attack famous benchmark problems. 
We could improve five upper bounds for the large unsolved benchmark problems 
YNl, YN4, SWV12, SWV13 and SWV15. The maximum improvement has been 
achieved for SWV13 and shortens the gap between the lower and the former 
upper bound by about 57%. 

2 The Job Shop Problem 

The general job shop scheduling problem can be formalized as follows. There 
are a set J of / jobs, a set M oim machines, and a set T of n tasks. For each 
task t £ T there is a unique job J{t) G 77 to which it belongs, a unique machine 
M{t) G M. on which it requires processing, and a processing time p{t) G IN. 
There is a binary relation i? on T that decomposes T into chains corresponding 
to the jobs. This binary relation, which represents precedences between the tasks 
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is defined as follows: For every t e T there exists at most one t' such that 
6 R. If 6 R, then J{t) = J(t') and there is no a; ^ {tR'} such that 
(t, a:) e i? or (a:, t') G R. For any (n, w) £ R, v has to be performed before w. 
R induces a total ordering of the tasks belonging to the same job. There exist 
no precedences between tasks of different jobs. Furthermore, if {v,w) G R then 
M{v) ^ M(w). A schedule is a function S : T ^ IN U {0} that for each task t 
defines a starting time 5(f). 

Definition 1 A schedule is feasible, if 

Mv,w e T: (v,w) e R S(v)+p(v) < S{w), 

yv,w eT,v : M(v) — M(w) 5(n)+p(n) < S(w) V S(w)+p{w) < S{v). 

The length, respectively the makespan of a schedule 5 is defined by 

(1) A(5) := max(5(n) + p{v)), 

i.e., the earliest time at which all tasks are completed. The problem is to find 
an optimal schedule, i.e., a feasible schedule of minimum length. Minimizing the 
makespan A(5) in a job shop scheduling problem with no recirculation can be 
represented by a disjunctive graph, a model introduced by Roy and Sussmann 
in [16]. The disjunctive graph is a graph G = (V,A,E,p), which is defined as 
follows: 

y = r u { 1 , 0 }, 

A = {[v,w]\ v,w e T, {v, w) e R} u {[I ,w] \ w e T, e T : {v, w) g R} u 
{[u,0] I n G T, eT : {v,w) e R}, 

E = I v,w G T,n ^ w,M{v) = M{w)}, 

fj, : 

The vertices in V represent the tasks. In addition, there are a source (I) and 
a sink (O) which are two dummy vertices. All vertices in V are weighted. The 
weight of a vertex p{v) is given by the processing time p{v), p{v) := p{v), 
(/x(7) = /x(0) = 0). The arcs in A represent the given precedences between the 
tasks. The edges in E represent the machine capacity constraints, i.e., {n, w} £ E 
with v,w £T and M{v) = M{w) denotes the disjunctive constraint and the two 
ways to settle the disjunction correspond to the two possible orientations of 
{u, w}. The sonrce I has arcs emanating to all the first tasks of the jobs and the 
sink O has arcs coming from all final tasks of jobs. 

An orientation on R is a function 6 : E ^ T x T such that <i({n, w}) G 
{(n, w), (w), n)} for each {v,w} € E. A feasible schedule corresponds to an ori- 
entation 6 on E {6(E) — {(5(e) j e G E}) for which the resulting directed graph 
(called digraph) D :=G' — {V,A,E,ii,6{E)) is acyclic. 

A path P from Xi to Xj, G IN, f < j : Xi,Xj G R of the digraph D 
is a seqnence of vertices (xj, ..., a;j) G V such that for all i < k < j, 
[xk,Xk+i] G A or {xk,Xk+i) G 6(E). 

The length of a path P(xi,Xj) is defined by the sum of the weights of all 
vertices in P: \(P(xi,Xj)) = J2i=i Kxk)- The makespan of a feasible schednle 
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is determined by the length of a longest path (i.e., a critical path) in the digraph 
D. The problem of minimizing the makespan therefore can be rednced to hnding 
an orientation 5 on. E that minimizes the length of A(Pmax)- 

3 Basic Definitions 

Simulated annealing algorithms are acting within a configuration space in 
accordance with a certain neighborhood structure or a set of transition rules, 
where the particular steps are controlled by the value of an objective function. 
The configuration space, i.e., the set of feasible solutions of a given problem 
instance is denoted by E. For all instances, the number of tasks of each job 
equals the number of machines and each job has precisely one operation on each 
machine. In that case, the size of T can be upper bounded in the following way: 
In the disjunctive graph G there are at most Z! possible orientations to process 
I tasks on a single machine. Therefore, we have |.F|< (/!) . 

To describe the neighborhood of a solution 5 G JF, we define a neighborhood 
function 7} : E ^ p(-F). The neighborhood of S is given by r]{S) C IF, and each 
solution in r]{S) is called a neighbor of S. Van Laarhoven et al. [21] propose 
a neighborhood function which is based on interchanging two adjacent op- 
erations of a block. A block is a maximal sequence of adjacent operations that 
are processed on the same machine and do belong to a longest path. We will 
use an extension of the neighborhood where changing the orientation of a larger 
number of arcs is allowed within a path related to a single machine: 

1. Choosing two vertices v and w such that 

M{v) — M{w) — k and there exists a path P{v, w) with 

Vx e P(v,w) : M(x) — k and {v,x), (x',w) 6 Pmax for x,x' e P; 

2. Reversing the order of the path P(v,w) such that 
\/{xi,Xj) e P{v,w) : (xi,Xj) e 6{E) \xj,Xi) e 6'{E); 

3. If there exists an arc (u,v) such that v ^ u,M(u) = k, then replace the arc 
(u,v) by {u,w); 

4. If there exists an arc {w, x) such that w ^ x,M{x) = k, then replace the arc 
{w,x) by {v,x). 

Thus, the neighborhood structure is characterized by 

Definition 2 The schedule S' is a neighbor of S, S' G rj{S), if S' can be obtained 
by the transition rules 1 — 4. 

The transition rules do not guarantee a priori that the resulting schedule 
is feasible, i.e., that the corresponding digraph is acyclic. Therefore, a test of 
feasibility has to be performed after any proposed transition. But the feasibility 
test can be combined with the computation of the length A(Pmax) which has to 
be done for any transition. For the special case of rji, Van Laarhoven et al. have 
proved the following 

Theorem 1 [21] For each schedule S ^ Pmm> there exists a finite sequence of 
transitions leading from S to an element of Ptnin- 
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As already mentioned in Section 2, the objective is to minimize the makespan 
of feasible schedules. Hence, we define Z{S) := A(Pmax), where Pmax is a longest 
path in D{S). Furthermore, we set 

(2) := { 5 I 5 e and V5'(5' e ^ Z(S') > Z{S)) }. 



We introduce biased generation probabilities which give a preference to tran- 
sitions where a decrease of longest paths is most likely, i.e., the selection of v, w 
will depend on the number of longest paths to which a = (v,Xi) and b = (xj,w) 
with M{xi) = M(xj) = k do belong in D. 

In case of adjacent v and w, i.e., if there is an arc {v,w) G Ss(E) and there 
exist arcs {u,v) and {w,x) such that v ^ u,w ^ x, the transition r]L introduced 
by Van Laarhoven et al. [21] will be executed. Therefore, the transition rjL is a 
special case of our neighborhood function. 

In simulated annealing, the transitions between neighboring elements depend 
on the objective function Z. Given a pair of feasible solutions [S, S''], S' G rj{S), 
we denote by G[S, S'] the probability of generating S' from S, and by A[S, S'] 
the probability of accepting S' once it has been generated from S. Since we 
consider a single step of transitions, the value of G[S, S'] depends on the set 
??(S). In most cases, a uniform probability with respect to S is taken which is 
given by | rj{S) ]“^. In our approach, we are trying to connect the generation 
probability with the number of longest paths that might be shortened by a 
single transition. Hence, instead of a single longest path, we have to calculate 
the number of longest paths to which a single arc {x, y) belongs, where (x, y) is 
on the path P{v, w) specified by choosing v and w in the first transition rule. We 
introduce the following values: ^{z) := A(P(/, z)), where [x, z\& A and P{I,x) 
is a longest path from / to 2 , and K[{x,y)] :=| {P j = P' (I , x){x,y)P" {y ,0) 
and A(P) = A(Pmax)} | • The values u{z) and k(z) can be computed in expected 
linear time 0{\V\). 

Now, the generation probability depends on the uniform probability 1/m of 
choosing a path P{z' , z") of length I (the number of jobs) such that M (n) = A; is 
fixed for all vertices v of P{z', z"). Then, for any {x, y) from P{z' , z"), the number 
of longest paths K[{x,y)~\ to which (x,y) belongs is calculated. We denote 



(3) 



9[{x,y)] 



K[{x,y)] 

{u,v) on P(z',z") 



Next, two independent random choices {x,y), {x',y') are made on P{z',z") in 
accordance with the probability g. If x precedes x' or x = x' , the vertex x is 
taken as v of the first transition rule, and y' is taken as w. 

If the probability of generating a feasible solution S' is denoted by §[S' ] 
P{z', z")] , the generation probability can be expressed by 



G[5, S'] 



l/m-g[S' 1 P{z',z")], AS' G g(S), 
0, otherwise. 



(4) 
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The choice of the generation probability has been confirmed by our compu- 
tational experiments (see Table 1): For our best solutions on the YN and SWV 
benchmark problems we observed a number of longest paths between one and 
five, while the maximum number of longest paths was about 50 for the YN 
and 64 for the SWV benchmark problems, even for solutions close to the upper 
bounds. Since the neighborhood function t]l from [21] is a special case of our 
transition rules 1 — 4, we have: 

Lemma 1 Given S G there exists S' G rj{S) such that G[5, S'] > 0. 

The acceptance probability ^[5, S"j, S' G ri{S) C T, is given by: 



( 5 ) 



A[S,S'] 




z(s')-z(s) 

C 



if 2{S') - 2{S) < 0, 
, otherwise. 



where c is a control parameter having the interpretation of a temperature in an- 
nealing procedures. Finally, the probability of performing the transition between 
S and S', S, S' G T, is defined by 

|'G[S', S"j S"j, ifS'^S, 

(6) Pr{5 ^ 5'} = -j j ^ g] . otherwise. 

I Qt^s 



Let as (k) denote the probability of being in the configuration S after k steps 
performed for the same value of c. The probability as (A;) can be calculated in 
accordance with 



(7) as (A:) := ^ ag(A: - 1) • Pr{g ^ 5}. 

Q 

The recursive application of (7) defines a Markov chain of probabilities as (A;). 
If the parameter c = c(k) is a constant c, the chain is said to be a homoge- 
neous Markov chain; otherwise, if c{k) is lowered at any step, the sequence of 
probability vectors a(A:) is an inhomogeneous Markov chain. 

As pointed out in [21] (see Section 2 there), the convergence to minimum 
elements of JF^in is based on a subdivision of recurrent computations into irre- 
ducible ergodic sets and an additional set of transient elements of JF, respectively. 
From transient feasible solutions, elements of ergodic sets can be reached, but 
not vice versa. Thus, if is reachable with a non-zero probability from any 
S £ !F, the following convergence properties can be shown for infinite Markov 
chains: 

(8) lim( lim ^as(A:)) = 0; 1™ '^^So(k)) = I- 

c — >-0 fe— >00 c — >0 Ai— >00 

min Sq^J ~ min 

Since Lemma 1 provides that !Fmm is reachable from any S E if, we obtain: 

Theorem 2 For Markov chains defined by (4), (5) and (7), the probability to 
he in a feasible solution Stj G Fmin is equal to 1 after an infinite number of steps 
and for a decreasing control parameter c — y 0. 
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Because the computation of an infinite Markov chain cannot be performed 
in practice, the calculations have to be interrupted at any fixed “temperature” 
c after a certain number of steps. Hence, one has to define some heuristic rules 
bounding the number Lc of transition steps for fixed values c := c{t). Further- 
more, it is necessary to determine how the parameter c{t) has to be changed. 
These problems are discussed in the next section, where two cooling schedules 
are defined and their complexity will be analyzed. 

Additionally, we consider a third cooling schedule which defines a special 
type of inhomogeneous Markov chains. For this cooling schedule, the value c{k) 
changes in accordance with 



(9) 



c{k) = 



r 



ln(fc -I- 2) 



fc = 0, 1, .. 



The choice of c{k) is motivated by Hajek’s Theorem [8] on logarithmic cooling 
schedules for inhomogeneous Markov chains. If there exists So, S\, ... , Sr £ 
J-[So = S A Sr = S') such that G[5„, > 0, u = 0, 1, ... , (r — 1) and 

2^{Su) < h, for allu = 0, 1, ... , r, we denote height(S => S') < h. The schedule 
S' is a local minimum, if S £ JF \ Tmm and Z{S') > Z{S) for all S' 6 riL{S) \ S. 
By depth{Smin) we denote the smallest h such that there exists a S' G fF, where 
Z(S') < Z{Smin), which is reachable at height Z{Smin) + h. 

The following convergence property has been proved by B. Hajek: 



Theorem 3 [8] Given a configuration space C and a cooling schedule defined by 

r 



c(k) = 



ln(A: -I- 2) 



A: = 0, 1, ..., 



the asymptotic convergence ^H(k )^ — > 1 of the stochastic algorithm, which 

is based on (2), (5), and (6), is guaranteed if and only if 

(i) yn, H' G C3Ho, Hi, ...,Hr G C{Ho = H AHr = H'): H^+i] > 0, 

1 = 0, 1, ..., (r-1); 

(a) V/i : height{H ^ H') < h height{H' ^ H) < h; 

(Hi) r > max depth{Hmin)- 

Hence, the speed of convergence associated with the logarithmic cooling sched- 
ule (9) is mainly defined by the value of F. The condition (i) expresses the 
connectivity of the configuration space. In our case of F, the mutual reachabil- 
ity of schedules cannot be guaranteed as we will show in the following section. 
Therefore, Hajek’s result cannot be applied to our scheduling problem. 



4 Two Simulated Annealing-Based Heuristics 



The following section describes the main parameter of the two cooling sched- 
ules designed for simulated annealing-based heuristics. For both cooling sched- 
ules, the starting “temperature” c(0) is defined by 



A ^max 

^(oT = 1 - p, 



c(0) = - 



^^max 

ln(l - pi)’ 



(10) 



e 
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where pi is a small positive value. 

The decremental rule of the first cooling schedule is given by the simple 
relation c(f + 1) := (1 — P 2 ) -c^t), where P 2 is a small value larger than zero. The 
stopping criterion is related to expected number Rc{S) of trials that are necessary 
for leaving a given configuration S. In case of T < Rc{S), it is indeed the time 
to finish the procedure of simulated annealing. By straightforward calculations, 
one can show that for arbitrary S £ R Rc{S) < , where := 

max max \Z{S') — 2^(5) |. Therefore, one obtains the following conditions: 

S' G v{S) 

/. ^max 

( 11 ) 

Let x(c) denote the expected ratio of the total number of processed trials and 
the length Lc at temperature c. The number denotes the number of cooling 
steps, and we define the average acceptance rate by setting y := Ijtfin • Ec 
Hence, for the length L of Markov chains the number of processed trials from 
c(0) until c(tfin) is given by Furthermore, let T denote an upper bound 

for the time needed to perform the updating of the objective function and the 
decisions in accordance with (5). Now, the number of steps reducing the 
parameter c{t) can be calculated from 

(1 - PlY'" -C(0) = C(tfin), 

which implies 

Therefore, the number of cooling steps does not depend on the objective function. 
Given the length of Markov chains L, the algorithm has to perform L ■ tfin 
accepted moves before the algorithm halts because of (11). 

Theorem 4 For the first cooling schedule, the expected run-time is bounded by 



ln(l - pi) 
InL 



)-T-X- 



The complexity of updating the objective function and further auxiliary op- 
erations can be upper bounded by 0(n). Thus, if we assume a square complexity 
for the basic arithmetic operations, we can use the time bound T = Oin • \v? n). 
Based on (4), we have for the size of the neighborhood IrXS') |< m-Z-(/ — 1)/2 = 
0{n^ /m). Hence, we obtain: 

Corollary 1 For the neighborhood relation T]{S), the following upper bound of 
the expected run-time can be derived: 

T,^0(^-ln2n-x) 
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For the second cooling schedule, the control parameter c{t) is decreased by 
the rule 

(13) c(t+l)-= 

where ‘^{pz) is defined by p{p-.i) ln(l + p'i)/{Z'^^^ — Zmin) < 1 (see equation 
(15) in [20]). 

Our stopping criterion is derived from the condition 



^ j ^ ^ _ I - 

(14) C(t) • |e=e(t) < e • 

i.e., the changes of the objective function are very small compared to the ex- 
pected initial value of Z at c(0). In our specific case the condition leads after a 
series of transformations to the following inequality: 



(15) 






If we assume integer values for the processing times p{t), the minimum im- 
provement of the objective function is lower bounded by 1. Hence, we can take 
e := 1/Z™^^ as a lower bound for e. Prom (15) we can derive the following upper 
bound of cooling steps: 



(16) 



tfin ^ 



-ln(l -pi)-ln|,F| 

^ py 



in case of e := \jZ^^^. The upper bound is related to the approximation (15). 
Finally, we have : 

Theorem 5 For the length L of Markov chains, e := IfZ’^^^, and the second 
cooling schedule, the expected run-time can be upper bounded by 



Tii^L- 



-ln(l - pi) 
ln"(l + Ps) 




^min) - T -X- 



We use again T = 0(n ■ In^ n) and obtain: 

Corollary 2 The following upper bound of the expected run-time is valid for the 
second cooling schedule: 

In the second cooling schedule, the run-time is longer compared to the bound 
given in Theorem 4, but one has a better control of the final outcome because 
the objective function is explicitly used in this cooling schedule. 
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5 The Logarithmic Cooling Schedule 



In this section, we consider a uniform generation probability which is based 
on the neighborhood rjL introduced by Van Laarhoven et al. [21]: 



(17) 



G[S, 5'] 



1 



Before we perform the convergence analysis of the logarithmic cooling schedule 
dehned in (9), we point out some properties of the configuration space and the 
neighborhood function. The condition (i) of Hajek’s Theorem (see Theorem 3 
of the previous section) requires that for every two schedules S, S' there exists a 
finite sequence of transitions leading from 5 to 5". It is not difficult to construct 
a problem instance containing pairs of schedules S, S' where such finite sequence 
does not exist. Moreover, not every transition move is reversible. 

Let S and S' be feasible schedules and S' G tyi(*S'). To obtain S' from S, we 
chose the arc e = (v,w) and e G Pmax- If -Z(5) > Z{S'), it is not guaranteed 
that e' G therefore the move might be not reversible. 

Lemma 2 Any transition move which increases the value of the objective func- 
tion is reversible. 



If the value of the objective function increases, only a path containing one 
of the selected vertices v, w can determine the new makespan after a transition 
move. It can be shown that all paths whose length increases contain the edge 
e' = {w,v). Since e' belongs to a longest path it can be selected for the next 
transition move and the previous step will be reversed. The same idea is used to 
prove the upper bound {p{v) + p{w)) for the increase of the objective function 
value within a single transition step. 

Lemma 3 The increase of the objective function aZ in a single step according 
to rjL (S — S') can be upper by (p(n) +p(ui)). 

To express the relation between S and S' according to their value of the objective 
function we will use < 2 :, > 2 :, and = 2 :: 

S <z S' instead of S' G 7]l{S) & {Z{S) < Z{S')), 

S>z S' instead of S' erjL{S)k (Z (S) > Z (S')), 

S =z S' instead of S ^ S' k S' e rjL(S) k (Z(S) = Z(S')). 

The notations 2 <, 2 >, and 2 : = will be used for the analogous relation between 

S and S' in case that S can be generated from S'. Furthermore, we denote: 

p(S):=\{S <z 5'}|, p(5):=|{5z< 5'}|, 

q(S) :=|{5 =2 5'}|, q(S) :=|{5z= 5'} | , 
r(S):=\{S >z S'}\, f(S):=\{Sz>S'}\. 

These notations imply 

(18) p(5) + g(5) + r(5) = |ryn(5)| -1. 
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Lemma 4 For all S e the relations p(S) < p{S) and r{S) > f(S) are 
valid. The relation between q{S) and q{S) is not predetermined. Note, the strong 
inequality between p, p and r, f is possible. 

The relations follow from Lemma 2. For the relation p{S) < p{S) we note that 
there might be a schedule S' which reaches 5 with a single decreasing transition 
step, but since a decreasing step is not guaranteed to be reversible, S' is not 
necessarily a neighbor of S. Therefore, the strong inequality between p, p is 
possible. The relation r{S) > f{S) is considered in a similar way. 

Now, we analyze the probability a.s{k) to be in the schedule S E F after k 
steps of the logarithmic cooling schedule defined in (9) , and we use the notation 



(19) 




c(fc) 



, k > 0. 



By using (6) and (17), one obtains from (7) by straightforward calculations 



as{k) = as(k - 1) • 

p(S)+q(S) 

+ E 

i = l 
Sz< Si 



(p{S) + 1 


P{S) 

- E 

i = 1 


1 


Vhn(S)| 


1 MS) 1 




S Si 




aSi(A:- 1) 


f(S) 

+ E 

j = 1 


asj (k - 1) 


1 m(Si) 1 


1 Vl(Sj) I 



Sz > Sj 



z:(Si)-z(s) 

(A; + 1) ^ 




+ 



The representation (expansion) will be used in the following as the main relation 
reducing as{k) to probabilities from previous steps. We introduce the following 
partition of the set of schedules with respect to the value of the objective func- 
tion: 

To := W„,in ; Lft+i :={S : SeF A \fS'{S' eF\\jLi^ Z{S') > 2(S) )}. 

1=0 

The highest level within F is denoted by Given S E F, we further denote 

by Wmin{S) ■= [S', Sk-i, ■ ■ ■ , S'] a shortest sequence of transitions from S to 
Wmin, i-e.. S' E Fiain- Thus, we have for the distance d{S) := length(Wmin{S)) ■ 
We introduce another partition of F with respect to d(5): 

S - 1 

S E Mi d{S) = * > 0, and Ms = [J Mi, i.e., F = Als- 

i=\ 

Thus, we distinguish between distance levels Mi related to the minimal number 
of transitions required to reach an optimal schedule from Fm\n and the levels Lh 
which are defined by the objective function. By definition, we have Mq '■= Lq = 
Fmin. We will use the following abbreviations: 

( 20 ) 

{k + 2-t) ^ 
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(21) Ds{k-t) 



P{S) + 1 

I riL(S) I 



P(S) 

- E 

i = 1 
S Si 



1 



(k + 2-t) 



During the expansion of {k),S ^ Mq , terms according to S are generated as 
well as according to all neighbors S' of S. Some terms generated by the expansion 
of 5 contain the factor a^/ (fc — 1) and can therefore be summarized with terms 
generated by the expansion of S'. However, it is important to distinguish between 
elements from M\ and elements from Mi, i > 2. For all S ^ Mi, we obtain: 

+ r(5) 1 

\Vl(S) 

^ = 1 
S <z Si 



Z(Sj)-Z(S) 

(fc + 1) ^ 



a.(fc 1) ( P(S) + ^ + l(S) 



P(S) 

+ E 

i = 1 
S <2 Si 



1 

I Vl(S) 



z(Sj)-z(Si 

(A: + 1) ^ 



^s(k - 1). 



In case of 5 £ Mi , some neighbors S' of S are elements of Mq and do not generate 
the terms related to S' > 2 : S' because the as'{k) are not expanded since they 
are not present in the sum ^ Sis{k). Therefore, r'{S) < r{S) many terms 
are missing for S G Mi and the following arithmetic term is generated: 

(22) a,().-I).(l-^). 

where r'(S) := | {S' : S' G i?i(S) A S' G Mo} |. On the other hand, the expansion 
of a.s{k) generates terms related to S' G Mg with Sz > S' and containing 
a 5 / {k — 1) as a factor. Those terms are not canceled by expansions of as/ (k). All 
S G Ml therefore generate the following term: 



(23) 



r'(S) 

E 



i = 1 

Sj e Mo n riL(S) 



asj (k-l) 

I VL(Sj) I 



1 



Z(S)-Z(Sj) • 

(A: + 1) 



Now, we consider the entire sum and take the negative product as(fc) • r'(S)/ 
I Vl{S) I separately. By using the abbreviations introduced in (21) we derive the 
following lemma. 

Lemma 5 After one step of the expansion of ’^s^Mo^sik), the sum can be 
represented by 



E ^s(k) = Yj ^s{k - 1) - Y 



S ^ Mo 


S ^ Mo 


r'(S) 




+ E 


E 




s e Ml 


j = 1 



r'(S) 

Vl{S) 



S & Ml 



Vl(S) 



•as(A:-l) + 



as,. (A; - 1). 



Sj e Mo n t]l{S) 
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The diminishing factor (l — r'{S)/ | | ) appears by definition for all 

elements of Mi. At subsequent reduction steps, the factor is “transmitted” suc- 
cessively to all probabilities from higher distance levels Mi because any element 
of Mi has at least one neighbor from Mj_i. The main task is now to analyze how 
this diminishing factor changes if it is propagated to the next higher distance 
level. We denote 

(24) ^ as(fe) = ^ ^ n{S' ,t) ■ as'{k - 1), 

Si Mo Si Mo S' € Mo 



i.e., the coefficients /r(5', i) and t) are the factors at probabilities after t 
steps of an expansion of ^ as(fc)- Hence, for S 6 Mi we have //(S', 1) = 
1 — r'(S)/ |ryi(S) I, and //(S, 1) = 1 for the remaining S £ Ms\{M(j U Mi). For 
S' £ Mo we have from Lemma 5: 



(25) 



KS',1) 



p(s’) 

E 



Si e Ml A s' e ru,(Si) 



I riLiS') I ■ 



Starting from step (A: — 1), the generated probabilities a^/ (k — u) are expanded in 
the same way as all other probabilities. We set /x(S, j) := 1 — v{S,j) because we 
are mainly interested in the convergence 0. We perform an inductive 

step from {k — t+l)to{k — t) and obtain for t>2: 

Lemma 6 The following recurrent relation is valid for f{S, f), t > 2; 

KS, t) = KS, t-i).Ds{k-f) + Y. + E • f{S", s, t). 

Furthermore, for the special cases S £ Mj, j > t, S G Mi, t = 1, and S £ 
Mo, t = 1 we have, v{S,t) = 0, v{S,f) = r'(S)/ | \, and v{S,t) = 

1 - E/i? 1)/ I i?l(S) I, with Sj £ Mi AS £ rjL{Sj) respectively. 

Exactly the same structure of the equation is valid for /t(S, f) which will be used 
for elements of Mq only because these elements are not present in the original 
sum Es^Mo ^s{k). Now, any v{S, t) and /J,{S,t) is expressed by a sum E«^« of 
arithmetic terms. We consider in more details the terms associated with elements 
S° of Mil and S^ of Mi. We assume a representation //(S°,t — 1) = '^T(S°), 
and p{S,t- 1) = T,T(S), S ^ Mq. 

If we consider r'(S^)/ | riL{S^) \ and Eso<^:Si f(S^,S°,t)/ \ r]L{S°) \ sepa- 
rately, the difficulties arising from the definition v{S, /):=! — /<(5, t) can be 
avoided, i.e., we have to take into account only changing signs of terms during 
the transmission from Mi to Mq and vice versa. 

Definitions The expressions r'{S^)/ \ 7/l(5'^) |, and Eso<^ 5 i /(•S'^, S'”, t)/ 

I fii(S”)|, are called source terms of n{S^,t) and /j,{S°,t), respectively. 

During an expansion of J2siMo^s{k), the source terms are distributed per- 
manently to higher distance levels Mj. Therefore, at higher distance levels the 
notion of a source term can be defined by an inductive step: 
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Definition 4 For all S £ Mi, i > 1, any term which is generated according 
to the equation of Lemma 6 from a source term of n(S',t — 1), where S' 6 
Mj_i ni]L{S), is said to he a source term of u{S,t). 

We introduce a counter e(T) to terms T which indicates the step at which the 
term has been generated from source terms. The value e(T) is called the rate of 
a term and we set e(T) = 1 for source terms T. 

The value e(T) > 1 is assigned to terms related to Mo and M\ in a slightly 
different way compared to higher distance levels because at the first step the S° 
do not participate in the expansion of ^s{k). Furthermore, in the case 

of Mo and Mi we have to take into account the changing signs of terms which 
result from the simultaneous consideration of u(S^,t) and fj,(S°,t). 

Definition 5 A termT° is called aj*^^ rate term offi{S°,t) and j > 2, if either 
T° = —T and e(T) = j — 1 for some u(S,t—l), S £ Min??i(S'°), ore(T°) = j — 1 
for some ia{S', t — 1), S' £ Mo H t]l(S°). 

A termT is called aj^^ rate term ofn{S^,t) and j > 2, if either e{T) = j — 1 
for some v{S,t — 1), S' £ (Mi U M 2 ) fl riL(S^), orT= —T', and e(T') = j — 1 
for some S' £ Mo fl j]l(S'') with respect to fJ,(S' , t—1). 

A term T is called a j**’ rate term of v{S,t), S £ Ali and i, j > 2, if 
e(T) = j — 1 for some n(S',t— 1), S' £ (MjUAfj+i) nrii(S), or T is a j**’ rate 
term of v{S" ,t — 1) for some S" £ Afj_i. 

Let Tj(S, f) be the set of rate arithmetic terms of v{S^, f) {lJ,(S°, t)) related 
to S € Ms- We set Aj{S,t) := Y^T&T(s,t) same notation is used in case 

of S = S° with respect to ja{S°, f). 

Lemma 7 // S £ Ali ^ Mo, then Aj(S, t) = 0 for j > t — i + 1. For S° the 
condition j > t implies Aj{S°,f) = 0 and 

t-i+l t 

v{S,i) = Aj{S,t) and p(S°,t) = j]Aj(S°,t). 
i=i i=i 

In order to simplify the analysis of products of factors Ds{k — t) in positive 
arithmetic terms, we consider the following representation: From Lemma 3 we 
have 

(26) w := 2 (S') - Z{S) < p{w) +p{v); 

The upper bound is applied to: 

The general structure of reductions is explained by 

Lemma 8 The sum Aj (S, t) of rate terms of n{S, f) is equal to 

E Ps(j,t-q)-Ap{Sg,t-q) - E Ps(i> ^ - 9) ‘ Ap-(S', f - g), 

Sg€M.\Mo S'GMo 
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where p < (t — q) — i + 1, Sq € Mi ^ Mo, and p' < t — q. The Ps(j, t — q), 
denote the product of factors 1/ (A;+2 — r)““/ |j?l(5)|, 

and Ds{k — r). 

The same structure, with exchanged sets Ms\Mo and Mo, is valid for Aj{S,t), 
which is part of the value of the coefficient p{S°, t). 

Let 5«+i} be the factor that is generated according to 

Lemma 8 during the transition Su — t 5'«+i ■ We recall that there are two different 
types of factors corresponding to transitions Su ^ Su = S'w+i : The factor Ds,, 
and the factor 1/ |til( 5'„) |. The transitions Su Su are called selftransitions. 
Thus, starting with S = So, the expansions are performed until the values Ai 
have been reached and we denote by 

a single product which represents to a particular path of transitions. Here, 
sg{Sq) = +1 for Sq ^ Mo, and sg{Sq) = —1, otherwise. Based on Lemma 8, 
Aj (5, t) can be expressed by 

i€J 

where | J \ depends on the number of neighbors at any step of the expansions. 

Since we are interested in absolute values of differences of vs{k), cf. (35), 
it is sufficient to consider upper bounds for the absolute values of Aj{S,t). We 
note that the types of factors pr the same in positive and negative 
products Vs(j, t) because the positive summands are recursively generated from 
negative summands and vice versa. Hence, for an upper bound of | Aj(S', t) \ we 
can consider w.l.o.g. only the positive summands of Aj{S,t) = niht). 

This applies to 5 0 Mo as well as to S' G Mq. Thus, we obtain 

(27) \Aq(S,t)\< 

i £ J, sg = +l i€J 



Since we consider absolute values only, we use the notation Vg{j, t) :=| Vgij, t) |, 
i.e., the sign sg{St-i) is deleted from the product. The computation of products 
Vg{j,t) can be represented by a tree T(5,j, t), where Aj{S,t) denotes the root 
and the edges denote the transitions Su — t S'u+i . The internal nodes are marked 
by the corresponding Su, and the node Su leads to | Pl{Su) | + 1 nodes of a 
greater distance to the root. The | PhiSu) \ + 1 results from the two types of 
factors generated by selftransitions. The edges are marked by the corresponding 
factor from 



(28) 



(Z{Su)-Z(S)) 

1 {k T 2 — t T n) ^ 
\r]L(Su) r I rjLiSu) I 



or Ds„{k-t + u) 



with \riL{Su) I in the denominator. The leaves are marked by Ai(5t_i, 1). 

We note that the products ’Pg{j,t) contain (t — 1) factors and 

we will classify the products mainly by the number a < t — 1 of selftransitions 
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Su —1 S-u, and particularly by the number of factors Ds^- For the different 
types of transitions let b denote the number of transitions from Lh to a higher 
neighboring level L^i , h' > h, c the number of transitions to a lower level Fft" , 
and d the number of transitions to the same level L^- We set v := [a,b,c,d\ 
and denote by Vs[j, t, v] a product containing a factors Ds^ {k — tu) and 6, c, d 
factors of the corresponding type. A given v induces a fixed subtree Tv of T. 

Lemma 9 The position of a particular transition is independent of k and only 
defined by the structure of the entire computation tree T. 

According to (27) we have 

(t-l-b-c-d) (t-l-c-d) (t-l-d) (t-1) 

|Aj(S',t)|< E E E E E 

a=0 6=0 c=0 d=0 

where 77(Tv) enumerates the products (paths) in the subtree Tv 

By Lemma 3, the total increase of the objective function is upper bounded 
by b • (max{p{w) +p(u)}) = b ■ u>. Therefore, to reach Mq or Mi at most b • u> 
decreasing steps are necessary, i.e., c < b ■ to. Since 



(29) 



b= {t — 1) — a — c — d, 



wehavefromc< 6 -tu the relation 6 > {t—l)—a—b-u>—d,b-{l+io) > {t—l)—a—d, 
(30) b>^.((t-l)-a-d). 



If Vs[j,t,v] contains b transitions to higher levels, the product of the b corre- 
sponding factors f(Su,S'^,tu) can be upper bounded by 



(31) 



6 6-1 , 

f(Su,SM < n r < 

u=o (^k + 2 — tu) ^ 





jh+l) 

r 



Given a particular product Ts{j,t), the a factors from selftransitions can be 
considered together and one obtains the upper bound 



P(S) 



(32) 



(Z(Sj)-Z(S)) 



The product becomes larger, if only a single transition to higher levels is chosen 
in any factor. We note that any choice of a single transition is possible. Therefore, 
it is possible to choose a consecutive chain of increasing values of the objective 
function 2' (5j,_i)-2'(5j,); 2' (5j, _ 2 )- Z(5j, _i); • • • ] The product 

is subdivided into sequences of factors which belong to chains of an increasing 
objective function. We denote by 

(33) Zrnax:=max{/ : 2'(5*,_ J - 2(5*,); • • • ; 2'(5q) - ^( 5 * 2 ) within 
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the maximum possible length of an increasing chain within the configuration 
space T. We obtain a further increase of the product (32) if any k + 2 — t + u 
is substituted simply by A; > 1. The factors which belong to the increasing 
chain of length Ij < /max are considered together and by x we denote the number 
of subdivisions. From (26) we have for x > and Ij < /max 



(34) 



a p(S) 

n(‘-E 



“h 2 — t 



{Z{Si)-Z{S)) 

r 



< e 



U=1 



i=l 



n 



a 

2^rnax • 



Now, we incorporate (31) and (34) in upper bounds of 
for a fixed v, where we take into account that except for selftransitions of the 
first type the corresponding factors are divided by \riL{S) |. 



Lemma 10 Given S E T and k > a > 0, then 

a 



(t — g — c/+a;) 



E nfct.v] < e . (i _ l) 



t—a—b 



Based on the upper bound for the particular products, we can derive an 
upper bound for the total sum: 

Lemma 11 Given S E Z, k > and F > 0(/max ■ then 

|a,(5,/)|= E E E E E < n-e/^-2 

abed i£Jv 



where ^ > 1 and p > 0 are suitable constants. 



The proof is based on the following three cases, where we assume t > /s/y: 
> A/4 > A:/(4 • 7); a < f/4 and d < 2 - a] a < f/4 and d>2- a. 

Now, we compare the computation of v{S,t) (and n{S°,t)) for two different 
values t = k\ and t = k-z, i.e., v{S,t) is calculated backwards from k\ and k‘z, 
respectively. To distinguish between v{S, t) and related values, which are defined 
for different k± and k- 2 , we will use an additional upper index. At this point, we 
use again the representation of u{S, t) from Lemma 7 (and the corresponding 
equation for p(5°,/)). 

Lemma 12 Given k -2 > ki and S 6 Mi, then 

A2{S,t) = A2{S,k‘2 — ki +t), if t>i + 2. 



The proposition can be proved by induction over i, i.e., the sets Afj. Lemma 12 
implies that at step s + 2 (with respect to ki) 



A2 (S', s + 2) = A^iS, k -2 — ki + s + 2) for all S E Ms- 

For A]; (S, f ) , the corresponding equality is already satisfied in case of i > s. The 
relation can be extended by induction to all values j >2: 
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Lemma 13 Given k -2 > ki, j > 1, and S G Mi, then 

A](5, t) - A^(5, k-2-ki+t), if V > 2 - (j - 1) + i. 

We recall that our main goal is to upper bound the sum ffMo^sik). If 
a(0) denotes the initial probability distribution, we have from (24) for the two 
values k ‘2 > k\ 

(35) l^as(fci) -^as(fc 2 )| < |(^K5',fc2) ~Y^v{S,ki)) -as(0)| + 

SgMo SgMo SgMo SgMo 

+ \iYl - E *i)) • ^5' (0) I . 

s'eMo s'eMo 

Lemma 14 Given the parameter /max which characterizes the maximum number 
of consecutive transition moves which increase the value of the objective function, 
then there exist a constant p> 0 and c > 1 such that k- 2 >ki> implies 

I ^ (u{S, k 2 ) - p{S, ki)) ■ as(0) I < 2 • 2 c . 

SgMo 

For I X^s/gMo k\) — , ^ 2 )) • »S' (0) | we obtain the corresponding upper 

bound in the same way. 

Theorem 6 The stochastic algorithm which is defined by (5), (6), and (9) com- 
putes for the job shop scheduling problem after 

A; > (c-log^)p + 

0 

steps of the inhomogeneous Markov chain a schedule S such that 

Y, ^s{k)<5 and therefore, Y as{k) >1-6, 

S^Mo SeMo 

Therefore, the probability that after k steps a schedule S has a makespan of 
minimum length is larger than 1 — d. 

Proof: We choose k > and T > 0(/max'i^) in accordance with Lemma 11 

and 14. Furthermore, we have 

E^s(A:) = ^(as(A:)-as(A: 2 )) + E^s(A: 2 ) = ^(2^(5, fc2)-z^(5. A:)) -as(0) + 

S^Mo S^Mo SgMo S^Mo 

+ ^(p(5',A:) -/r(5',A:2)) -as/(0) as(A;2). 

S'eMo S^Mo 

The value from Lemma 14 is larger but independent of ki = k, i.e., we can 
take a k -2 > k such that J2s^Mo < f • 

If additionally both differences ^)) Ss'gMq 

(/x(S",fc) — fj,{S',k 2 )) are smaller than d/3, we obtain the stated inequality. 
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Lemma 14 implies that the condition on the differences is satisfied in case of 
2 • 2~ < 5/3, which is valid, if (c • log(6/<5)) ’’ <k. 

q.e.d. 

If ^max < n/ const., Theorem 6 implies an exponential lower bound for the 
number of steps of the inhomogeneous Markov chain. The theoretical result was 
used in a simulated annealing procedure to attack famous benchmark problems. 
The value /max was estimated by computational experiments for the YN and the 
SWV benchmark problems. These estimations of /max were used as the parameter 
r in the logarithmic cooling schedule (9). We could improve five upper bounds 
for the large unsolved benchmark problems YNl, YN4, SWV12, SWV13, and 
SWV15. The maximum improvement has been achieved for SWV13 and shortens 
the gap between the lower and the former upper bound by about 57%. The results 
are shown in Table 1). 



Instance 


J X M 


LB 


UB 


t — > 00 




Time 


YNl 


20 X 20 


826 


888 


886 


894 


70267 


YN2 


20 X 20 


861 


909 


910 


918 


63605 


YN3 


20 X 20 


827 


894 


899 


904 


61826 


YN4 


20 X 20 


918 


972 


970 


975 


63279 


SWVll 


50 X 10 


2983 


3005 


3017 


3149 


18271 


SWV12 


50 X 10 


2972 


3038 


3012 


3188 


28112 


SWV13 


50 X 10 


3104 


3146 


3122 


3289 


33048 


SWV15 


50 X 10 


2885 


2940 


2924 


3088 


35477 



Table 1. Results on still unsolved problems YN and SWV 



LB denotes the lower and UB the upper bounds known from the OR-Library. Five of 
these upper bounds (YNl, YN4, SWV12, SWV13 and SWV15) could be improved and 
the gap between LB and UB has been shortened by about 57% on SWV13. For the 
results in column the table indicates the run-time of our simulated annealing 

procedure CSgj^.^ in seconds on a Sun Ultra 1/170 SPARC machine. 
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Abstract. A new approximate algorithm (GBA) for the Steiner prob- 
lem in graphs (SPG) based on an iterative execntion of a previous heuris- 
tic to the problem (SPH) is presented. GBA looks for a snbset of vertices 
which, used as terminals, can prodnce a near-optimal solution. In addi- 
tion, the tree associated to each of these snbsets is selected at random. 
The worst-case time complexity of the algorithm is C3(|E|^) and the cost 
of the solution is gnaranteed to be less than twice the optimal. GBA is 
tested on classical benchmark problems and its performance compares 
favorably to that of some of the best existing SPG approaches with res- 
pect both to solution qnality and specially runtime. 



1 Introduction 

The SPG asks for the shortest tree interconnecting a subset of vertices in a 
connected graph. It is one of the NP-hard classical problems in combinatorics 
and it appears to be very useful in the design of several kinds of communication, 
distribution and transportation systems. In particular, the wire routing phase in 
VLSI design can be formulated in terms of the Steiner problem. The SPG allows 
a substantial variety of exact and heuristic algorithms which produce optimal 
or near-optimal solutions. While exact methods are only useful for solving small 
instances of the problem, well-known heuristic techniques must be used in order 
to attain an approximation to the optimal solution. 

There are two main criteria when evaluating the quality of heuristic algo- 
rithms. In the first, the best algorithm assures the lowest performance ratio 
bound (PRB), which is the ratio between worst approximated solution cost and 
optimal solution cost. In this case, Karpinski and Zelikowsky’s [7] algorithm 
with PRB equal to 1.644 — the solution cost will not be higher than a 64.4% 
above the optimum — is the best up to now. In the second criterion, the best 
algorithm performs empirically better in benchmark problem instances. Genetic 
algorithms [5,4, 10] had produced the best experimental results up to now with 
the widely-used graph instances of the OR-library [2] . 

Both criteria can be considered from a critical perspective. On the one hand, 
PRBs are often quite difficult to be proved and they are usually close enough 
to assure better empirical performance. On the other hand, graph instances 
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could not be representative enough, and the number of experiments too small to 
consider the results statistically significant to the problem. In fact, the algorithms 
which compete with the empirical criterion are usually based on combination of 
heuristics with a good PRB. They evaluate a relatively high number of candidate 
solutions and return the best one they find which, in most of the cases, is nearer 
to the optimal solution than it should have been expected according to these 
bounds. 

Our aim is to introduce a new approximate algorithm (GBA) based on the 
well-known shortest path heuristic (SPH) proposed by Takahashi and Matsuyama 
[9]. In GBA, SPH is iteratively applied by using the vertices of the tree attained 
in the previous iteration as the subset of vertices to be connected in the SPG. 
This way, our algorithm can obtain solutions which could not be obtained by 
the SPH itself. In other words, GBA increases the exploration power of SPH 
because it considers as a candidate solution some trees out of range to the SPH. 

This work is organized as follows: SPG notation and some basic proper- 
ties are presented in section 2. Section 3 describes SPH and introduces a new 
hill-climbing algorithm (HCSPH) based on this heuristic. In section 4, GBA is 
introduced by incorporating new elements to HCSPH in order to increase its 
performance. Experimental results are given in section 5 and, finally, section 6 
ends the exposition with some conclusions. 



2 SPG Formulation 

Let G = (V, E, c) be an undirected graph with set of vertices V, set of edges 
E and a cost function c : E TZ'^ defined on the edges, and let C E be a 
non-empty set of vertices called terminals. The SPG is to find a minimum cost 
subgraph of G which interconnects all the vertices in N . The cost — c(G) — of a 
graph G being the sum of the costs of its edges. 

A Steiner tree of G for N — StT{G, N) — is a tree of G interconnecting N . 
Since the costs of the edges are all positive, the SPG is to find a minimum cost 
StT(G, N) or MStT(G, N). Vertices not in N — non-termmals — from a given 
StT{G, N) are called Steiner vertices and those of them not used to connect 
N are called ties. In other words, a tie is a Steiner vertex with degree one or 
a Steiner vertex that changes to degree one by recursive deletion of ties. (It is 
clear that a M StT{G, N) cannot have ties). The function trim{T) returns the 
tree T once its ties have been deleted or trimmed. 

A minimum spanning tree of G for U GV — MSpT(G, U) — is a minimum 
cost tree of G with set of vertices U. It can be found by Prim’s 0{\U\'^) or 
Kruskal’s 0{\Eu\^og\Ejj\) algorithms, where Eu C E is the subset of edges of 
the subgraph of G induced by U . Since a M StT{G, N) with set of Steiner vertices 
S is also a MSpT(G, N US) the SPG can be solved by finding S' C E so that it 
minimizes the cost of M SpT{G, NGtS). The number of Steiner candidate subsets 
is 21^1-1^1. 

In Figure 1 there is an example that summarizes this notation. 
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G={V,E,c) 

= {t;i, D2, i>3} terminals 
V — N = {si , S2} non-terminals 



T is StT{G, N) and MSpT{G, iV U {si , S2}) 
S2 is a Steiner vertex and si is a tie 
trim{T) = T - {si} is MStT(G, N) 



Fig. 1 . The Steiner Problem in Graphs 



The distance graph Dq of G is the complete graph with set of vertices V 
in which the cost of every edge (vi,vj) is given by the cost of the shortest 
path between Vi and vj in G. Every MStT(G, N) is also a M StT{Da, N) and, 
moreover, given the latter we can obtain the former by replacing every edge by 
its associated shortest path in G. Since there is a M StT^Dc, N) with at most 
k = |A^| — 2 Steiner vertices, Dq can be used instead of G in order to reduce the 
number of subsets of Steiner vertices to be considered. 

3 HCSPH Algorithm 

The SPH is one of the classical approximation algorithms for the SPG. Its 
worst-case time complexity is 0(|A^||Ep) and it returns an approximation — 
SP H{G, N) — never greater than twice the cost of the optimal solution. (The 
PRB of the SPH is 2 — 1/|A|) [9]. The SPH works as follows: 

— Step 1. Let T be a subtree formed by an isolated terminal. Take T as an 
initial partial solution. 

— Step 2. Find a terminal v closest to a vertex v' in T. Add to T the shortest 
path joining v to t;'. 

— Step 3. Repeat step 2 until T contains all the non-terminals (|7V| — 1 times). 

It is easily seen that SPH is closely related to Prim’s algorithm for the MSpT 
problem. In fact, the SPH grows a single subtree which is expanded during each 
iteration by the addition of a closest terminal v together with the non-terminals 
on the shortest path from v to the subtree. 

Rayward-Smith and Clare [8] noticed that the solution given by this heuristic 
does not ensure a MSpT and proposed the following two steps to further improve 
the solution attained by SPH. 

— Step 4. Ti MSpT{G,veriices[T)). 

— Step 5. T 2 :—trim{Ti) is the approximate solution. 
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These two steps do not alter the complexity of the method which depends 
on the computation of the shortest paths. When the initial tree T is not im- 
proved by the steps 4 or 5 these two conditions must hold: T is already a 
M SpT(G, vertices(T)) and Ti does not have ties. The ties distribution depends 
on the vertices ordering, that is to say on the way in which MSpT and SPH 
build their trees. If T2 is shorter than T, a new improvement can be attained by 
considering the Steiner vertices of T2 as terminal vertices and starting again the 
whole heuristic. This is the main point in which the following new hill-climbing 
algorithm HCSPH is based on: 

— Step 0. Vjv := N. 

— Step 1. Ti := SPH(G, V/v); T2 := trim{Ti). 

— Step 2. Ts := M SpT[G,vertices[T2))', T4 :—trim{T^). 

— Step 3 . if c(Ti) > 0(74) then Vn verticesiT^)', go to Step 1 

else Ti is the approximate solution. 

HCSPH “inherits” the PRB from the SPH in the first iteration of the algo- 
rithm. It is not difficult to see that c(Ti) > c(T2) > c(T3) > c(T4) so HCSPH 
is obviously a hill-climbing algorithm. Moreover, it can be used as a hill-climber 
for any approximate solution given by other SPG heuristic. In that case the set 
of vertices of this solution should be used instead of N at Step 0 . 

Complexity Analysis. The worst-case time complexity per iteration of HCSPH 
is 0 (|Vjv| |Cp). This is due to the computation of the shortest paths. Since 
Vn might contain all the vertices, the complexity of a single iteration becomes 
0 {\V^). Consequently, the possibility of computing beforehand all the shortest 
paths should be taken into account. Once all the shortest paths have been com- 
puted by means of Floyd’s 0 (|C|^) algorithm, the worst-case and the best-case 
time complexities per iteration are 0(|f/p) and 0 {\N{'^) respectively. 

The worst-case time complexity of HCSPH is -|- k ■ |Cp), where k 

is the number of iterations. Since in each iteration the cost of the solution is 
improved, we can assume that k will never be greater than \V\ — or c|C| for a 
small constant c — and, therefore, HCSPH complexity is ( 9 (|Cp). 

It is important to point out that if the complexity of computing all shortest 
paths cannot be reduced, HCSPH has the lowest complexity that a competitive 
SPH heuristic — an heuristic aiming to attain the best possible solution — can 
achieve. In order to being competitive, a graph pre-processing has to be done to 
reduce the size of the graph. Having in mind this goal, several graph reduction 
rules have been proposed [6]. Since all the shortest paths are required even for 
some of the most basic reduction rules, HCSPH complexity cannot be lower than 
the one of the shortest paths computation. Moreover, the size of the graph could 
have been reduced so the number of vertices used to evaluate the complexity 
of one HCSPH iteration can be lower than the number of vertices of the initial 
— non reduced — graph. 

In the next section GBA algorithm is introduced by updating HCSPH basic 
template in order to increase its performance. 
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4 GBA 

There are only two ways in which a shorter tree can be attained by an iteration 
of HCSPH: By adding Steiner vertices via SPH or by trimming tie vertices. 
These two ways depend on the SPH and MSpT implementation, on the vertices 
ordering and, also, on the method in which the shortest paths are computed. 
Our proposal here is to introduce new elements aiming to decrease this vertices 
ordering dependency, to reduce the solution search space and to increase the 
probability that both Steiner and tie vertices come up. 

In the next subsections we explain the main decisions that have been taken 
to design GBA using HCSPH as a basis. A template of the final version of GBA 
is shown in the last of these subsections. Therefore, this subsection will provide 
a detailed description of the way in which these elements have been incorporated 
to the algorithm. 

4.1 Vertices Reordering 

Given a subset of vertices Vs we can generally find several SPH{G, V5) and 
also some MSpT{G, V5). However, since the implementation of these functions 
is usually deterministic, the tree produced given a subset of vertices will always 
be the same. As a matter of fact, we do not have any a priori way to decide 
which of these trees is better for our purpose. In other words, the way in which 
SPH and MSpT select their respective solutions is arbitrary so it has nothing to 
do with the decisions by which the algorithms had been designed. As a result, 
HCSPH is exploring the space of candidate solutions for the SPG in a slanted 
way which produces different results even for isomorphic graphs. 

Aiming both to avoid unclear slant and to increase GBA exploratory so- 
lution power, the vertices are reordered at random before each computation of 
SPH and MSpT. This way, these functions become non-deterministic and several 
advantages can be taken: 

— Given the same subset of terminals, different trees can be returned by these 
functions so there are more trees that can effectively be reached by GBA. 

— The tendency to return similar trees with consecutive subset of terminals, 
which are likely to be very similar to each other, is reduced. Consequently, 
the chances that tie vertices and Steiner ones come up are increased. 

— Two iterations starting with the same initial subset of terminals can pro- 
duce a different solution so the stop conditions given for the HCSPH can 
be improved. GBA stops when a better solution has not been found after a 
fixed number of iterations. Using this last improvement time — LIT — GBA 
explores more candidate solutions in order to attain a better approximation. 

— Part of the vertices order dependence, which is present in most of the SPG 
algorithms, is avoided by GBA. (As we show below, GBA does not avoid 
the dependency when the shortest paths are computed). Consequently, the 
results obtained by GBA given a graph can be nearly exported to any iso- 
morphic graph. 
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4.2 Deleting Non- Terminals of Degree Two 

One of the ways in which a shorter tree is found by HCSPH is by the inclusion 
of new Steiner vertices. It is clear that if these vertices have at least degree three 
they contribute to reduce the cost of the solution. But this is not the case for 
Steiner vertices of degree two. It can be observed that, at the same time, HCSPH 
is looking for an approximate solution in both G and Dg- Non-terminals of degree 
two are completely unnecessary to shorten the tree in Dq- 

The function dd2 deletes these non-terminals of degree two. In particular, the 
subset Vs := dd2{T) contains the vertices of the tree T with the exception of its 
non-terminals of degree two. In GBA this subset is used as the initial subset of 
terminals for the next iteration instead of the vertices of T. It can be observed 
that a tree T' not larger than T can be constructed by connecting every pair of 
neighbour vertices of a deleted non-terminal by an edge which cost is equal to 
the distance between the vertices. By means of dd2: 

— The tree T' might be shorter than T as well as larger than M SpT{Da, Vs) 
so the chances that a shorter tree is found in the next iteration are increased. 

— The number of Steiner subsets to be considered is reduced to those with at 
most |N| — 2 vertices, which is the maximum number that Vs can contain, 
so the solution search space is reduced. 

— The SPH 0(|Vs||H|) worst-case time complexity becomes 0(|N||H|), there- 
fore GBA speed-up is increased. 

In addition to these advantages, dd2 is also a selective way to “shake” the 
tree in order to embody new Steiner vertices. This so-called shaking effect is 
considered in the next subsection. 

4.3 Shaking the Tree 

The inclusion of a function which randomly produces small perturbations to 
the terminal vertices subset Vs at the end of each iteration can be considered 
a sensitive point in GBA. This function allows each non-terminal vertex to be 
either included in Vs or excluded from it according to a very small probability. 
We consider these perturbations as a way of shaking the tree — shake{Vs) — in 
order to make new tie leaves to fall down and doing so to leave their place to 
new Steiner vertices. 

It is clear that the number of trees that can be accessed by GBA is increased 
by means of the shaking function even though it will not necessarily be an 
advantage. In any case, an important property is incorporated by this function 
to the algorithm: 

— It becomes clear that not all the vertices can be incorporated to a tree by 
HCSPH algorithm. An example of that is a non-terminal vertex that is not 
an intermediate vertex in any shortest path. By shaking the tree all the 
vertices have chances to be added to the terminal vertices subset so all the 
candidate solutions are reachable. Consequently, no optimal solution is never 
put aside of consideration. 
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The shaking must not alter the basic structure of the tree. Consequently, we 
work out the probability of shaking a vertex depending on the desired expected 
number of shaken vertices — ES — in the whole tree. For example, if we expect 
to change — add or delete — ES vertices, the probability of shaking a vertex 
Ps{v) will be: 



Ps{v) := ES/{\V\ - |7V|) ^veV-N 



The shake function is the most unsteady of all the elements that we have 
incorporated to GBA, particularly because it enables the possibility of explor- 
ing solutions which are larger than the best solution previously found. We have 
already contemplated the possibility of shaking only when a shorter tree has not 
been found in the last iteration. Because the global results of our experimenta- 
tions have improved slightly by shaking at each iteration, we have decided to 
incorporate the function accordingly. Anyway, by using more selective or care- 
ful ways to shake the tree, like dd2 function, the algorithm performance could 
probably be improved. 



4.4 Giving Priority to the Paths Instead of the Edges 



The performance of path based heuristics for the SPG is closely related to the 
way in which the paths are constructed. For example, non-terminal vertices that 
do not appear as intermediate vertex of any of the computed shortest paths, 
will never be incorporated as Steiner vertex of any tree by the SPH. To make 
the matters worse, when several same cost paths between two vertices are avail- 
able, the same arbitrary path is always chosen. This way, some vertices can be 
arbitrarily put aside of consideration. 

We have rejected the possibility of reordering the vertices each time a path 
is required — in a similar way as it has been done when a tree is computed — 
because it has an important impact into the algorithm complexity. So, our aim 
would be to reduce, as far as possible, this new “out of control” slant by trying to 
select the paths which are likely to be more suitable. We would like to incorporate 
as intermediate path vertices as many non-terminal vertices as possible. 

By now, GBA is at an early stage of this path selection strategy. Since the 
empirical results are more or less influenced by the way in which the paths are 
computed, we use a simple path selection strategy that can be explained as 
follows: Whenever a choice between a path and an edge of the same length is 
required, the path is selected. This strategy is used in every GBA function as 
well as in the computation of the paths. 

Since this strategy scarcely alters the algorithm in which it is applied, we 
consider it useful to introduce it in any path based heuristic algorithm. 
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4.5 GBA Template 

The following algorithm shows the final template of GBA. 

— Step 0. Vn N . 

— Step 1. Ti := SPH(G,reorder(V]\[)); T 2 := trim{Ti). 

— Step 2. T 3 := MSpT(G,reorder(vertices(T 2 ))); T 4 :=trim[T^). 

— Step 3. if stopGond(LIT) then Vjv := shake{dd2[T^)] go to Step 1 

else return the shortest tree found. 

Complexity Analysis. Two parameters have to be given before GBA can be 
executed: The LIT to stop the algorithm, and the expected number of shaken 
vertices per tree ES. The worst-case per iteration of GBA is 0(|A||G| -|- IV 2 P); 
where IV 2 I is the number of vertices of T 2 . The first term is related to the SPH 
computation and the second to the MSpT computation. Since V 2 could virtually 
contain all the vertices, the complexity of a single iteration becomes 0 (|Gp). 
(In fact, the best-case time complexity is 0(|Ap).) 

We have considered the possibility of suppressing step 2 in order to reduce 
the complexity to C7(|A||G|). In that case, T 4 should be replaced by T 2 at step 
3. There are two main reasons to reject this possibility. On the one hand, when 
dd2 is applied to a tree that is not a MSpT, some vertices that would produce 
a shorter tree in the next step might be deleted. These vertices could find it 
difficult to be incorporated to a solution when dd 2 is executed after each tree 
computation. On the other hand, this worst-case time complexity do not show 
the real behaviour of the algorithm. Assuming that the graph is not sparse 
enough to use Kruskal’s MSpT algorithm instead of the Prim’s one, the com- 
plexity upper-bound will hold when SPH incorporates most of the non-terminals 
to T 2 at Step 2 and, besides, this tree has few ties. But it can be seen that this 
is a very unusual case, specially, when the graph is not sparse. In practice, the 
complexity of each iteration is usually bounded by the computation of the SPH 
and, even in this case, the complexity bound does not reflect the real behaviour 
of one iteration of GBA. 

Like in HCSPG, the worst-case time complexity of GBA is 0(|H|^ + A;|Hp), 
where k is the number of iterations. In contrast to HCSPG, this number depends 
on the LIT so it is more difficult to be bounded. A 0(|G|) number of iterations 
has shown to be enough to produce the empirical results which are shown in 
the following section. Consequently, we propose to limit the maximum number 
of iterations to c|C|, for a very small constant c. This way, GBA achieves a 
worst-case time complexity of 0{\V\^). 

5 Experimental Results 

GBA have been tested on the 60 largest SPG instances from the OR-Library[2] 
denoted cl,...,c20, df,...,d20, el,...,e20. Every c-graph, d-graph and e-graph has, 
respectively, 500, 1000 and 2500 vertices. 
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The performance is compared to that of a genetic algorithm (GA) [5] that 
clearly outperformed all the previous tested algorithms [3,4, 10] in both, solution 
quality and runtime. In [3], up to 6 different algorithms had been compared 
using the same graph instances in similar conditions. In [5], GA performance 
was compared to the genetic algorithm proposed by H. Esbensen [4] which had 
produced the best quality results in the shortest time. 

Before GBA itself is executed an attempt to reduce the size of the given prob- 
lem is performed using standard graph reduction techniques. Since the reduction 
routines used are the ones produced by Esbensen himself [3], the comparisons 
have been done with exactly the same problem instances. 

Experimental results comparing GBA and GA performances are given in 
the next four tables. In addition, table 4 includes the quality results obtained 
by an iterative algorithm (IKMB) [1], which is considered as one of the best 
deterministic heuristics. IKMB results have been extracted from [3] and may 
give an idea of the difficulty associated to those problem instances. GA results 
have been taken from [5]. The parameters setting for GBA are LIT = 1000 and 
ES = Both algorithms were implemented in C programming language and 
run on a SUN Ultra 30, 250 MHz, 256Mb RAM. Tables 1, 2 and 3 give details 
of the experiments with graphs c, d and e respectively, and Table 4 summarizes 
these results. 



Table 1. Experimental results with graphs c 





[Reduced Size 


1 Cost 


|CPU-Time(S. 


ecs) 


Iterations 


Graphs 


1 1^1 1^1 




1 Opt. GA 


GBA 


Red. 


GA GBA 


Avg. Max. 


cl 


145 


5 


263 


85 


.0 


.0 


5 


5 


0 


1 


1 


c2 


130 


8 


239 


144 


.0 


.0 


5 


6 


1 


1 


1 


c3 


120 


35 


232 


754 


.0 


.1 (0,1) 


5 


14 


2 


315 


826 


c4 


109 


38 


221 


1079 


.0 


.0 


5 


15 


2 


143 


446 


c5 


37 


17 


91 


1579 


.0 


.0 


5 


4 


0 


1 


1 


c6 


369 


5 


847 


55 


.0 


.0 


5 


10 


1 


6 


46 


c7 


382 


9 


869 


102 


.0 


.0 


5 


12 


1 


5 


46 


c8 


336 


54 


818 


509 


.0 


.0 


5 


37 


3 


26 


228 


c9 


349 


78 


832 


707 


.0 


.3 (0,1) 


6 


67 


8 


395 


1179 


clO 


213 


76 


624 


1093 


.0 


.0 


6 


39 


4 


92 


222 


cll 


499 


5 2184 


32 


.0 


.0 


5 


13 


1 


15 


75 


cl2 


498 


9 2236 


46 


.0 


.0 


5 


15 


1 


6 


13 


cl3 


463 


65 2108 


258 


.0 


.0 


5 


58 


5 


72 


301 


cl4 


427 


81 1961 


323 


.0 


.0 


6 


55 


5 


63 


175 


cl5 


299 


92 1471 


556 


.0 


.0 


6 


49 


4 


13 


43 


cl6 


500 


5 4740 


11 


.0 


.0 


5 


12 


1 


5 


28 


cl7 


499 


9 4698 


18 


.0 


.0 


5 


13 


1 


17 


47 


cl8 


486 


70 4668 


113 


.1 (0,1) 


.0 


6 


50 


4 


36 


71 


cl9 


473 


98 4490 


146 


.0 


.0 


6 


69 


6 


55 


152 


c20 


386 143 3850 


267 


.0 


.0 


6 


85 


7 


12 


27 
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Tables 1, 2 and 3. 



1. Reduced Size: The characteristics of the graphs once reduced. 

2. Cost: The cost of the optimal solution (Opt) and the average difference 
between approximated and optimal costs in 10 executions per graph of both 
(G A) and (GBA) . (For the e-graphs, G A has been executed only once) . When 
the difference is not zero, the minimum and maximum differences between 
the cost of an approximated solution found and the optimal cost, is given in 
brackets on the right side. 

3. CPU-Time (Secs): The time in seconds spent in the pre-processing to reduce 
the graphs size (Red.) and the average time used by both (GA) and (GBA) 
without taking into account the pre-processing. 

4. Iterations: The average (Avg.) and the maximum (Max.) number of itera- 
tions until the best solution has been found by GA. 



Table 2. Experimental results with graphs d 





1 Reduced Size 


1 Cost 


|CPU-Time{S( 


ecs) 


Iterations 


Graphs 


1 IV^I lA^I 




Opt. 


GA 


GBA 


Red. 


GA GBA 


Avg. Max. 


dl 


274 


5 


510 


106 


.0 


.0 


50 


9 


1 


8 


16 


d2 


285 


10 


523 


220 


.0 


.0 


50 


13 


1 


2 


11 


d3 


224 


67 


441 


1565 


.0 


.2 (0,2) 


51 


42 


5 


198 


519 


d4 


159 


66 


339 


1935 


.0 


.0 


51 


33 


3 


3 


9 


d5 


97 


48 


246 


3250 


.0 


.0 


52 


17 


2 


84 


321 


d6 


761 


5 


1741 


67 


.0 


.0 


50 


22 


1 


16 


45 


d7 


754 


10 


1735 


103 


.0 


.0 


50 


22 


1 


1 


1 


d8 


731 


124 


1708 


1072 


.2 (0,1) 


1.3 (0,3) 


51 


259 


25 


257 


620 


d9 


654 


155 


1613 


1448 


.0 


.3 (0,2) 


52 


348 


30 


274 


556 


dlO 


418 


146 


1317 


2110 


.0 


.8 (0,2) 


53 


156 


19 


303 


889 


dll 


993 


5 


4674 


29 


.0 


.0 


50 


27 


2 


128 


270 


dl2 


1000 


10 


4671 


42 


.0 


.0 


50 


29 


2 


1 


1 


dl3 


922 


122 


4433 


500 


.0 


.0 


51 


232 


19 


108 


344 


dl4 


853 


160 


4173 


667 


.0 


.0 


52 


317 


24 


71 


195 


dl5 


550 


157 


2925 


1116 


.0 


.0 


54 


146 


19 


89 


385 


die 


1000 


5 10595 


13 


.0 


.0 


50 


23 


1 


11 


53 


dl7 


999 


9 10531 


23 


.0 


.0 


51 


25 


1 


3 


13 


dl8 


978 


145 10140 


223 


1.3 (1,2) 


1.4 (1,2) 


50 


320 


24 


406 


967 


dl9 


938 


193 


9676 


310 


1.4 (1,2) 


1.0 (1,1) 


51 


353 


29 


182 


468 


d20 


814 


324 


8907 


537 


.0 


.0 


52 


568 


63 


26 


78 
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Table 3. Experimental results with graphs e 





1 Reduced Size 


1 Cost 


|CPU-Time(Secs) 


Iterations 


Graphs 


1 |G| \N\ 


|£^l 


1 Opt. GA 


GBA 


Red. 


GA GBA 


Avg. Max. 


el 


680 


5 


1286 


111 


0 


.0 




773 


18 


1 


2 


6 


e2 


710 


9 


1328 


214 


0 


.0 




773 


21 


1 


13 


37 


e3 


637 


199 


1233 


4013 


0 


7.4 


(4,10) 


779 


442 


52 


684 


1895 


e4 


435 


164 


964 


5101 


0 


2.1 


(1.3) 


783 


220 


25 


661 


1524 


e5 


222 


108 


649 


8128 


0 


.0 




800 


62 


6 


164 


393 


e6 


1845 


5 


4318 


73 


0 


.0 




756 


56 


3 


7 


28 


e7 


1891 


10 


3388 


145 


0 


.0 




775 


66 


3 


23 


128 


e8 


1723 


286 


4193 


2640 


1 (1.1) 


2.4 


(0.5) 


791 


1803 


202 


618 


1219 


e9 


1608 


358 


4069 


3604 


0 


3.9 


(2.6) 


799 


2431 


320 


978 


1481 


elO 


1046 


366 


3388 


5600 


0 


4.3 


(1.6) 


827 


1427 


122 


169 


582 


ell 


2498 


5 


12093 


34 


0 


.0 




780 


81 


3 


7 


27 


el2 


2500 


10 


12123 


67 


0 


.0 




777 


87 


4 


107 


641 


el3 


2341 


321 


11760 


1280 


2 (2,2) 


2.0 


(0.4) 


788 


3461 


312 


882 


2898 


el4 


2139 


388 


11325 


1732 


0 


1.6 


(1.2) 


802 


3930 


229 


179 


361 


el5 


1461 


443 


8514 


2784 


0 


.0 




837 


2002 


180 


201 


504 


el6 


2500 


5 


29332 


15 


0 


.0 




781 


74 


3 


19 


80 


el7 


2500 


10 


29090 


25 


0 


.0 




779 


76 


3 


11 


35 


el8 


2429 


355 


28437 


564 


7 (7,7) 


2.8 


(2.4) 


781 


2300 


193 


502 


1823 


el9 


2351 


485 


27779 


758 


3 (3,3) 


0.4 


(0.1) 


787 


2923 


310 


374 


653 


e20 


1988 


758 


24423 


1342 


0 


.0 




801 


5078 


381 


37 


51 



Table 4- 

1. Error: The percentage of solutions of (IKMB), (GA) and (GBA) with error 
not greater that (0%) and (1%), and the percentage of graphs for which 
(GBA) has found an optimal solution when 10 executions per graph are 
considered (Best 10). 

2. Avg. CPU-Time(Secs): (GA) and (GBA) average time per execution. (Par- 
tial) does not include the graphs pre-processing time while (Global) does. 



Table 4. Summary of the experiments 











Error 






1 Avg. CPU-Time(Secs) 


1 


Class 


IKMB 
0% < 1% 


GA 

0% < 1% 


GBA 

0% <1% Best 10 


GA 

Partial Global 


GBA 

Partial Global 


c 


55.0 


80.0 


99.5 


100 


98.0 


100 


100 


31 


37 


3 


8 


d 


35.0 


80.0 


89.0 


100 


80.5 


100 


90 


148 


199 


14 


65 


e 


36.8 


68.4 


80.0 


95.0 


59.0 


100 


70 


1328 


2116 


118 


906 


Total 


42.3 


76.1 


89.5 


98.3 


79.2 


100 


86.7 


502 


785 


45 


326 
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Performance analysis. Summarizing from Tables 1 to 4, GBA finds a globally 
optimal solution in 79.2% of all runs and is within 1% from a global optimum in 
all the executions. These results clearly overcome the ones obtained by IKMB 
but are slightly inferior to the ones obtained by means of GA. However, the time 
difference in the execution of the two algorithms greatly makes up for the small 
quality differences in the results. When the time of pre-process to reduce the 
graphs size, which includes that of the shortest paths computation, is not taken 
into account, GBA turns out to be 11 times faster than GA. As indicated in 
[5], GA was already clearly faster than the rest of algorithms seen. Taking the 
pre-processing time into account, GBA turns out to be more than twice as much 
faster than GA. 

It is important to highlight the impact that the stop conditions chosen for 
each one of those algorithms have in the hnal results. We have verified that when 
more time is given to GBA, the quality of the results tend to get very near to GA. 
We feel that GBA converges in a much faster way towards approximate good 
solutions, which, we ought to remember, are the real object of our algorithm. 

Finally, we cannot help stating that the results obtained both by GA and 
GBA can be considered to be surprisingly excellent, baring in mind that the 
problem is NP-hard. Part of these results can be attributed to the weakness of 
some of the graphs, in particular those with a small number of terminals, a fact 
proved with the small number of iterations that, as shown in tables I, 2 and 3, 
are needed to find the optimum one. GBA attains better results than GA for 
graph eI8 and el9 which, in theory, are the strongest graphs of the benchmark. 
We conclude that more tests with difhculty graphs would be helpful to decide 
which of the two algorithms performs definitely better. 



6 Conclusions 

A new approximate algorithm (GBA) for the SPG based on the SPH has been 
introduced. The central point of the algorithm is the iterative execution of the 
SPH using the vertices of the tree obtained in the previous iteration as a new set 
of terminals to be connected. Since the initial set of terminals is used at the first 
iteration, GBA assures the performance ratio bound of the SPH so the solution 
cost will be lower than twice the optimal solution cost. GBA can also be used to 
obtain a shorter tree once an approximation has been given by means of another 
SPG heuristic. In this case, the set of vertices of the approximate tree should be 
used as the initial set of terminals. 

The worst-case time complexity of GBA is 0(|f/p), where \V\ is the number 
of vertices. (This bound is achieved by limiting the maximum number of itera- 
tions to c\V\ for a small constant c.) Since our aim is to improve the results in 
comparison to other SPG-heuristics, it is necessary to perform some routines in 
order to reduce beforehand the size of the graphs. Since shortest paths — Floyd’s 
(9(|Hp) algorithm — are also required for these routines, GBA complexity is the 
lowest that a well performing SPH heuristic can achieve. 
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Three additional elements have been incorporated to GBA with the purpose 
of improving its performance. Firstly, the vertices are randomly reordered each 
time that a new tree is computed. Secondly, the Steiner vertices of degree two are 
not considered as terminals in the next iteration. Finally, small perturbations are 
applied to the new terminal subset of vertices allowing the inclusion or exclusion 
of vertices at random. 

It has been shown that, at least one of the three additional elements con- 
tributes to each of the following positive effects to the algorithm: 



— To increase the chances that both non-terminal leaves and Steiner vertices 
come up and therefore, that shorter trees might be obtained. 

— To increase the capability of the algorithm to explore new solutions. 

— To expand the number of candidate solutions that can be reached by the 
algorithm. 

— To avoid “out of control” slants which depend on both the vertices order 
and the particular algorithm implementation. 

— To reduce the solution space to subsets of Steiner vertices with no more 
elements than the number of terminals. (It is known that there is at least 
one optimal solution with these characteristics.) 

— To give to these non-terminals which can never be incorporated to a solution 
by means of the SPH, the possibility of becoming part of a tree. 

— To increase the speed-up of the algorithm. 

GBA has replaced arbitrariness to non-determinism by means of the random 
vertices reordering. As a consequence, GBA becomes near^ independent of the 
vertices order and its empirical results can be nearly exported to any isomorphic 
graph instance. 

The performance of the algorithm has been tested on well-known benchmark 
instances with random graphs with up to 2,500 vertices and 62,500 edges. Ex- 
perimental results show that in all the executions GBA finds a solution which is 
within 1% from the global optimum. Moreover, the optimal solution is found in 
79% of all the executions and in 87% of the 60 larger graphs of the benchmark 
when 10 executions per graph are considered. 

This performance is compared to that of a recently appeared genetic al- 
gorithm (GA) which clearly outperformed all the previous heuristics in both 
solution quality and runtime. The solutions attained by GA are marginally bet- 
ter than these of GBA. However, GBA clearly outperforms GA when runtime 
is considered. Using as stop conditions those from which both algorithms have 
attained these results and when the pre-processing time to reduce the graphs 
size is not taken into account, GBA is about 11 times faster on average than 
GA. Taking the pre-processing time into account, GBA turns out to be more 
than twice as much faster than GA. 



^ This reordering does not affect the shortest path computation 
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Abstract. In this paper we survey the work done for graphs on random 
geometric models. We present some heuristics for the problem of the 
Minimal linear arrangement on [0, 1]^ and we conclude with a collection 
of open problems. 



1 Introduction 

The probabilistic method has become a powerful tool in combinatorics. A partic- 
nlar fruitful application of the probabilistic method has been the study of graph 
invariants for random graphs (chromatic number, independence number, etc.). 
The field started in 1959 with a paper by Erdos and Renyi [ER59]. In the last two 
decades, the techniques developed in the probabilistic method have been used in 
the design and analysis of algorithms. There have been two models of random 
graphs: Given n C N and 0 < p < 1, define Gn,p as the probability space over the 
set of graphs on vertex set V = [n] = {I, n} , and such that any two vertices 
i, j E:V form an edge with probability p. For the second model, given n, m C H, 
let Gn,m be the probability space of the graphs with vertex set V = [n] and 
edge set Em , a random subset of m edges from all possible edges in the complete 
graph on n vertices. In general Gn,m will behave similarly to Gn,p as p ~ m/ ( 2 ). 
Recall that f ^ g denotes that f/g^l&s the variables tend to infinity. Erdos 
and Renyi [ERGO] considered the Gn,m model to study the threshold for several 
questions related to the connectivity of graphs. To study the evolution of graphs, 
they start with the empty graph on n vertices and add edges randomly one by 
one until having the m edges. Their main result is that with high probabil- 
ity (whp) a graph becomes connected when m > + 0{n). Recall that a 

sequence of events {Sn} occurs whp if Pr[f„] 1 as n — >■ 00 . Therefore, for a 
threshold of p = whp the graphs G„,p will be connected. Good sources 

for random graphs are [Bol85,FM96] and chapter 10 of [ASE92]. See [MR95] for 
the “adaptation” of the probabilistic method to the design of algorithms. 

* This research was partially supported by the ESPRIT LTR Project no. 20244 - 
ALCOM-IT, CICYT Project TIC97-1475-CE and CIRIT project 1997SGR-00366 

M. Luby, J. Rolim, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 294-306, 1998. 

© Springer- Verlag Berlin Heidelberg 1998 
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In this paper we will consider the geometric models of random graphs. Ge- 
ometric random graphs are defined on the unit cube [0, lY,d > 2 by randomly 
choosing a sequence X — {X„} of independent and uniformly distributed points 
on [0, 1]'*. We can consider two main models: the Euclidian model, consid- 
ering the weighted complete graph on X, where the weight of an edge is its 
Euclidian distance (see section 2), and the random geometric model Gn(r), 
where the existence of edges depends on a parameter 0 < r < 1 that could be a 
function of n (see section 3). There is another model of geometric graph, called 
the independent model, where the distances are independent and identi- 
cally distributed (i.i.d.) random variables from the uniform distribution on 
[0, 1] and neither symmetry nor triangle inequality are assumed. The model, was 
introduced in [Kar77], it is a diflBcult model to work with and we shall not en- 
ter into it (see for example [AB92] and chapter 2 of [Wei78] for an interesting 
discussion on the independent model). 

Before going further, let us review some basic definitions from probability 
theory. Recall than given a sequence of random variables {Xn} and a random 
variable X, we say that Xn — ?■ X converges in probability if Ve > 0, Pr[|X„ — 
W| > e] 0 as n c». We say that X„ -i- X converges almost surely (a.s.) 
if Pr[limsupX„ — X = liminfX„] 1 as n — oo. This type of convergence 
is also called convergence with probability 1. Convergence a.s. is quite a 
strong statement, but it is an asymptotic condition, it says nothing about finite 
n. Convergence a.s. implies convergence in probability but not the converse. A 
sufficient condition for convergence a.s. is the Borel-Cantelli theorem. (For good 
treatments of stochastic convergence see, for example chapter 1 of [Wei78] or 
chapter 4 of [Chu74]). 

There are two different ways to get a set of uniform i.i.d. points on the unit 
square [0, 1]^* (they could be easily extended to the d-dimensional case): 

1. The vertices are random variables Xi i.i.d. from the uniform distribution on 
[0,1]^. 

2. Use a two-dimensional point Poisson process with intensity n on and keep 
the points that fall in the unit square. 

Recall that a two-dimensional point Poisson process with intensity A is a process 
of events in the plane such that (i) for any region of area A, the number of 
events in A is Poisson distributed with mean p = AA, and (ii) the events in 
nonoverlapping regions are independent. 



2 The BHH-theorem 

One of the seminal papers in the area of probabilistic analysis of combinatorial 
optimization problems in the Euclidian d-dimensional unit cube [0, 1]“^, Beard- 
wood, Halton and Hammersley proved the following result [BHH59] , 

Theorem 1. Let X — {Xi} be a sequence of independent and uniformly dis- 
tributed points in [0, 1]'*. Let Ln denote the length of the optimal solution of 
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the tmveling salesman tour (TSP) among X. Then there exists a constant 0 < 
/3{d) < c» such that 

— >■ f3[d) a.s. as n oo. 

n d 

It is still an open problem to obtain the exact value of the constants for 
specific dimensions. Beardwood, Halton and Hammersley gave upper and lower 
bounds for certain particular d, for instance for d = 2 they prove: 0.44194 < 
/3(2)/-v/2 < 0.6508. The best know result 0.70 < /3(2) < 0.73 was obtained 
empirically by D. Johnson (see section 2.3 of [Ste97]). 

Because the value of some of the steps in the proof of the BHH theorem, 
let us give a sketch of the proof for the bidimensional case. The reader could 
look into [Ste97] for filling the details and extend the proof to c/-dimensions. The 
basic steps in the proof of the BHH theorem are the following: 

1. Show concentration around the mean value. 

2. Bound (from above and below) the expected value. 

3. Use (De)Poissonization to show the convergence to a constant of the mean 
value. 

The first step for the TSP problem can be achieved using either Talagrand 
or Azuma inequalities, which together with Borel-Cantelli gives: 

Ln - E[Ln] = O(logn) a.s. (2) 

To get a rough upper bound on the size of E[Ln] for the TSP in [0, 1]^, it is used 
a dissection technique of the unit square. The square [0, 1]^ is covered with 
0(n) squares of size x An easy pigeon-hole argument gives a 

constant c such that for any set {xi, ... ,x„} C [0, 1]^, 

min{|a;j — xj\ : {xi, xj 6 {xi, . . . , x„}} < 

Therefore the length l„ of the longest TSP tour verifies the recursion 
In-i + 2cn“^/^, and suming up the bound for the maximum gives an upper 
bound on the expected value. So, for some constant cq, E[Ln] < 

To get the lower bound, it is used the fact that for any set of n independent 
and uniformly distributed points in the unit square, there is a constant c\ such 
that 

^[min{|Xi - Xj I |{A;-, Xj £ {Xi, . . . , A„}}}] > cm-1/2. 
Furthermore any tour has n edges as large as the minimum distance, therefore 

E[Ln] > c^n^!\ (3) 

To extract asymptotics on Uk — E[Lk\ is useful to consider a continuous problem. 
For any set S C let L{S) denote the length of the shortest tour through S. 
Let n denote a two-dimensional point Poisson process with unit intensity. We 
can define a new stochastic process {Z(t)} by Z{t) = T(/7[0, t]^). The cardinality 



( 1 ) 
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of the set JT[0,f]^ is a Poisson random variable with mean t^. Scaling by a \/t 
factor will made all the points fell into the unit square. Therefore the expected 
value of Z{t) can be expressed in terms of Uk, 

oo ,2k 

E[Z{t)]=tY,aue-^ — . 

A :=0 

From this formula it is possible to derive asymptotics for a^- Notice that the 
fact that all Of, are bounded shows that Z is a continuous function of t. 

The key step in the proof, is a dissection technique of [0, 1]^ that will give 
a geometric subadditivity expression for L. Given any m G N and f G M, f > 0, let 
{Qi}, 1 < f be a partition of [0, tY into squares of edge length f/m. We can 

select one point from each subsquare Qi, and compute a shortest TSP tour on 
this set of points. Once we have this tour we can complete the tour gluing at each 
point of the tour, the optimal subtour of the corresponding square. Therefore 
the additional cost of the gluing process can be bound with the bound on Lm'^ 
times the scaling factor, so there is a constant C such that, 

L({xi, ,,, ,Xn}n [0, tf) < ^ L{{xx, ... ,Xn}n Qi) -\~ Ctm, (4) 

i-\ 

From equation (4) we get the recursion, E[Z(t)] < w? E\Z {t / m)] + Ctm. Using 
standard algebraic techniques, together with a change of variables it is shown 
that there is a constant 7 such that 

00 

(5) 

k—0 

To de-Poissonize it is needed a relation between the a„ values and the E\Ln\ 
for a Poisson random variable with mean n. The key fact is that the values 
do not change rapidly. Notice that it takes an additional cost of 2-^/2 to join a 
tour on n points with a disjoint tour on m points, to obtain a tour on the n + m 
points. As the ak values are all bounded, it holds |a„ — Ok\< c|n — 

If is a Poisson random variable with mean n, 

00 00 

£'[ajv] = /k\ and \a„ — £'[ajv]| < Wn ~ ak\e~"’n^ /kl. 

k=0 k=0 

This relation together with some algebraic manipulation gives that |a„— i?[ajv]| = 
0{^/n). Putting all together and transferring the result to the discrete problem, 

5E[Ln]/n^^‘^ 7 as n — >■ c». ( 6 ) 

But notice that equation (5) gives the existence of /3 in the statement of the 
BHH-theorem, equation (3) gives the 0 < /3(d) < 00 condition and equation (3) 
gives the a.s. limit. 
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3 Other work on the Euclidian model 

The dissection technique of the unit cube into smaller disjoint subcubes, has 
been repeatedly used. In particular Karp [Kar77] used the dissection technique to 
give an algorithm to approximate the Euclidian TSP on [0, 1]^. Given a uniform 
i.i.d. set X in [0, 1]^, Karp’s algorithm consists in dissecting [0, 1]^ into n/m{n) 
squares {Qi}, each of size m{n) jn. Then, whp every square will contain at 

most m{n) points. Using dynamic programming, construct in polynomial time 
an optimum tour in each square, so m(n) should be of order logn or better 
log log n. Consider each tour in each Qi as a single point pi, and for any 1 < 
i, j < n/m{n), define the distance between pi and pj as the shortest Euclidian 
distance between any point in Qi and any point in Qj. Construct the minimal- 
length spanning tree joining all the {p,} points (it can be done in 0{n\ogn) 
steps [SH75]). The solution to the TSP is the closed walk that traverses each 
subtour once and each tree edge twice. Karp proves that asymptotically, with 
probability 1 the algorithm tends to give near optimal solutions (it produces a 
tour of length within (l-fe) of the optimal), and with probability 1 the algorithm 
runs in 0(n^ logn) steps. The approximation heuristic of Karp was generalized 
to the d-dimensional cube in [HT82]. Some work has also been done on the 
Euclidian directed TSP on [0, 1 ]^, where the direction among any two random 
points in [0, 1]^ is choosed independently with probability 1/2. If denotes the 
length of the minimal Euclidian length on such a graph, Steele [Ste 86 ] proves 
that asymptotically E[Dn] ~ 7 \Aj for a constant 7 . It is open to prove an 
a.s. convergence result for the Euclidian directed TSP. As mentioned in the 
introduction, we shall not consider the asymmetric TSP and refer the reader to 
section 2.3.1 in [FM96]. 

The result of the BHH-theorem was extended to other combinatorial prob- 
lems for the Euclidian model on [0, lY, the minimum matching [Pap78], the 
Steiner minimal tree [Ste81], and the minimum spanning tree [Ste 88 ]. For further 
work on this last problem see [AB92] and [Pen97a]. In fact, Steele generalized 
the BHH-theorem for a class of combinatorial problems that could be formulated 
as a subadditive Euclidian functional F from finite subsets of [0, l]'^ to the 
nonnegative real numbers. The functional F measures the cost that the problem 
optimizes (length of tour, weight of the spanning tree, etc.). Define T to be a 
subadditive functional if it has the following properties: 

1. T(0) = 0, 

2. F{cxx . . . , cxn) = cF{xi . . . , Xn), Vc > 0, 

3. F(xi + y,x 2 + y,--- ,x„+y) = F{xi, X 2 ,--- , «n) Vy e [0, 1]'', 

4. Geometric Subadditivity 

nP 

F(xi , Xn) < ^ F({xi ... , x„} nQi) + Com, 3cq > 0, Vm, n>l 

i=l 

where Qi is the i-th square of size 1 /m x 1 /m. 

5. F{xi ...,*„)< F{xi ... ,Xn, Xn+l)- 
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Notice that the geometric subadditivity is equation (4) in the proof of the BHH- 
Theorem. Steele proved that for each problem in this class, there exists a constant 
(3p{d) (depending on F and d) such that with probability one we have 

lim Ppid), 

n^oo 

where F„ is the optimal value of the functional under consideration for the graph 
formed with the first points of X, (see chapter 3 of [Ste97]). It is an open problem 
to determine the values of j3p for any of the interesting functionals, even for the 
case d = 2. 



An important set of problems motivated by the field of Computational Ge- 
ometry are the following: Given a set Ai = {Ai,} of random points uniformly 
distributed on [0, 1]"^, we wish to find: 

1. The length of the largest nearest-neighbor fmfes defined by 

Z = max min | \Xi — Aj 1 12 . 

l<z<n j^i 

2. The length Nk,n of the kth. nearest graph, in which each point is connected 
to its A;-th nearest neighbor. 

3. The length Vn of the Voronoi diagram of the points X . 

4. The length Dn of the Delaunay triangulation of the points X. 

For problem 1, Steele and Tierney [ST86] gave limiting distributions for Zn — 
Z{Xi , . . . , X„). If the sample Z is drawn uniformly from a d-dimensional torus, 
for d > 3 the limit distribution in this case differs from the case where the space 
X is drawn from [0, 1]^*, d > 3, due to the the boundary effects in the cube. This 
result should be taken into account by those who do simulation involving the 
computation of the nearest-neighbor procedure, for three or more dimensions 
they should work on the torus instead of the cube. The study of asymptotic 
dominance of the boundary with respect to the limit distribution of Z„ was 
continued in [DH89], where they reinforced the results obtained in [ST86]. 

Avram and Bertsimas [AB93] studied problems 2,3 and 4 and use a Poisson 
process with intensity n to get the set X of points uniformly distributed in [0, 1]^. 
It is known that if L denotes the optimal length for each of the problems. 



lim 

n— >-00 



E[Ln] 

„l/2 



/ 3 , 



where in this case, the constant /3 for each problem is explicitly known [Mil70]. 
Avram and Bertsimas prove for each problem a central limit theorems. 



lim Pr 

n — >-00 



Ln ~ E[Ln] ^ 
<t[L„] - 






where ^(x) is the Normal distribution function. They also prove that the rate 
of convergence in each problem is 0{ )• 




300 



J. Diaz, J. Petit, and M. Serna 



4 Random Geometric Graphs 



Let us consider the following model of geometric graph: Given n 6 N and 
r G [0, 1] define the random geometric graph (r) by selecting a set of points 
uniformly from [0, 1]^ to form the vertex set X — {Xi, X 2 , • • • , X„}. So each ver- 
tex Xi is a random variable i.i.d. Define the edge set Er — {{Xi, Xj)\ \ |X,-, Xj \ |co < 
r}. 

Notice we are using the norm on the unit square [0, 1]^, that if Xi — 

(xi, Hi) and Xj = [xj, t/j) the distance is defined by | \Xi, Xj \ |oo = max{(a;j — 

{Vi - %■)}• 

Depending on the specific values of n and r, the Gn{r) could be dense or 
sparse, connected or not connected, etc. Moreover, some of the techniques used 
for the previous structures doesn’t seem work for G„(r) (r < 1) (see section 6). 
The following result is given in [AR97a], 

Lemma 1. Given G„(r), let Pr[r] denote the probability that any two vertices 
Xi,Xj from the vertex set X form an edge (i.e. ||Xj, Aj||oo < r). Then, 

Pr[r] = (2r — r^)^. 

Define the random variable Kn(r) — \Er\ then, 

I^n{r) = (^ 2 ^ Pr[r]. 



In fact, 

-^^Pr[r] a.s. 

Moreover Apple and Russo study the ratio of convergence when both n and 
r{n) evolve. The basic condition that the sequence {r„} must satisfy is that as 
n — >■ 00 , 



nrj 

log n 



c 



(7) 



for a constant c G (0, 00 ] . 

Notice that equation (7) imposes a threshold condition r = on the 

evolution of r as n evolves to c». The reason for this threshold is that it keeps 
Gn{r) connected. An important issue in the Gn{r) graphs is the connectivity. 
Define the connectivity distance c„ of a random geometric graph by 



Cn = inf{r|G„(r)is connected}. 



Apple and Russo prove that asymptotically 
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Penrose [Pen97b], generalized the connectivity result to any metric U ,2 < 
p < oo, proving that asymptotically, with high probability, if one starts with 
single vertices and adds the corresponding edges as r increases, the resulting 
graph becomes (A; + 1) connected at the moment it achieves a minimum degree 
of Ar + 1. Recall than in the Erdos- Renyi model for random graphs Gn,p the 
connectivity threshold is logn/n [Bol85]. Therefore, both models have similar 
behavior for connectivity, but while in the G„,p model, the edges are independent, 
the edges in the G„(r) can have non-zero correlations. 

Under the threshold condition of equation (7), Appel an Russo give almost 
sure asymptotic rates of convergence (divergence) for several graph parameters. 
For example, they prove that as n — )■ c», the rate at which the minimum vertex 
degree A„(r„) of G„(r„) diverges, is that of any particular vertex in G„(r„). 

For a related parameter, let w„(r) denote the clique number of Gn{x). 
To get a bound on Wn(r) once more we use dissection. Let / G N be such that 
I < 1/r < / -f 1. Dissect [0, 1]^ in {I + 1)^ squares {Qi}, each Qi of side at most 
r, therefore all the points inside of any Qi belong to a common clique in Gn(x), 
which implies that n < (I + l)^Wn(r) so we get the bound, 

S (TTW- 

Other related parameter is the chromatic number Xnif) of Gn{x). From 
standard graph theory it is easy to see that 0 Jn{r) < Xn(r) < A„(r) + 1 and 
An(r) < a;„(2r) — 1. Using these inequalities, Apple and Russo gave theorems 
on the rate of convergence of uin{r) and Xn(r) when {r„} satisfies equation (7). 
They also give asymptotic rates of growth for the independence number /?«(»*) 
of Gn{x), under the assumption that {r„} satisfies equation (7). To prove this 
statement, they need an upper bound on /3„(r). To get it, they use dissection. 
Let {Ajj, • • • , Xi^} be an independent set (in the graph theoretical sense) of a 
random graph in Gn(r). Notice that the set of open squares {Oij} with side r 
and centered at must be disjoint. Moreover the area of each O* is r^. Using 
a pigeon-hole argument we can conclude that m < (1 + 1/r)^. This must be true 
for the largest independent set, so for all n and r l3n{r) < (1 + \jr'f‘ . 

In a sequel paper, Appel and Russo [AR97b] prove that under equation 7, the 
minimum degree <5„(r„) of Gn(r„) has an asymptotic convergence of l7(logn). 

5 The MinLa Problem 

Given an undirected graph G = {V, E) with |U| = n, a layout of G is a one- 
to-one function ip : V ^ \n]. The minimum linear arrangement problem, from 
now on denoted MiNLA , is a combinatorial optimization problem formulated as 
follows: given a graph G = (V, F), find a layout ip oi G that minimizes the cost 

uv^E 
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Minla is an interesting and appealing NP-hard problem with several different 
applications in Computer Science and Biology [DPSS98]. However, there exist 
exact polynomial time solutions for some particular kinds of graphs. The lack 
of efficient exact algorithms for general graphs has given rise to the possibility 
of finding approximation algorithms. An approximation scheme of MiNLA for 
dense graphs was presented in [FK96]. Moreover, a O(logn) approximation for 
general graphs and a O(loglogn) approximation for planar graphs is proposed 
in [RR98]. An study on the G„,p model for sparse graphs is given in [DPST98]. 
An experimental analysis of several heuristics for this problem is given in [Pet98], 
where it is observed that approximating geometric random graphs is harder than 
approximating standard random graphs. 

Let us consider geometric graphs G„(r) when r = ^Jc\ogn/n for some con- 
stant c £ M+. As we shaw in the previous section, whp such a graph is connected 
and its expected number of edges is 0(nlogn). Moreover, for a particular ran- 
dom graph, the real value of the number of edges is concentrated around the 
expectation. Random geometric graphs are not planar in general, therefore we 
can approximate the Minla on these graphs within O(logn) using the algorithm 
from [RR98]. However we can get weaker approximability results but using much 
more practical methods. 

Let us consider the Minla problem for such a graph. There are n\ different 
layouts for Gn(r). We can define the average linear arrangement cost as the 
normalized sum of the costs of all layouts. It is well known that given any graph 
with n vertices, the average length of an edge over all layouts is {n+ l)/3. Thus 
we get, 

Lemma 2. The expected value of the average linear arrangement cost for a 
graph G £ Gn{r) is 0(n^logn). 

In the Gn,p model the average over all layouts is of the same magnitude as the 
minimum [DPST98]. However this does not hold anymore for random geometric 
graphs, as we will see that the minimum value is of smaller magnitude. 

We first derive a lower bound for the minimum linear arrangement that holds 
for almost all random geometric graphs and then analyze two heuristics to obtain 
an upper bound. 



The lower bound. Consider [0, l]^and dissect it in an/\ogn disjoint squares each 
/ CkTi I (XTl 

of size W j X W j for some fixed (but arbitrary) constant a. A pigeon-hole 

type of argument will give us the following result. 



Lemma 3. Every square contains at least one vertex and at mosta\ogn vertices 
whp. 



Using the first part of this lemma we can obtain a lower bound: 
Theorem 2. There exists a constant ci such that whp, for any layout (p, 

ci(n/logn)^/^ < c^. 
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Proof. Conditioning on the event of lemma 3, after dissecting the square [0, 1]^, 
we have one node in each square. Taking a — rf2 the only possible connections 
correspond to neighboring cells. Therefore whp our graph contains a squared 
mesh with cm/ log n points. In [MD 86 ] and [MP80] it is shown that the optimum 
linear arrangement for a mesh n x n has cost 0.868n^ + 0(nf). We can apply 
the known optimal value for our mesh to obtain the claimed lower bound. 

To obtain upper bounds we consider two natural heuristics for the MiNLA 
problem on geometric graphs. These are the dissection and the projection heuris- 
tics. 



The dissection heuristic. Given a geometric graph in G„(r), 



1 . 

2 . 

3. 



Dissect [0, 1]^ into a{n/ \ogn) disjoint squares each of size y ^ " Y 1 

Assign to all vertices in the same square consecutive numbers. By our selec- 
tion of a, all vertices in the same subsquare are connected. 

Sort the squares following the up to down and left to right fashion and follow 
this order to assign numbers to vertices. 



Let (r) denote the cost of the layout obtained using dissection. 



Theorem 3. There exist a constant C 2 such that whp, 

Da„(r) < (c2nlogn)®/^. 

Proof. To prove the upper bound, given Gn(r) and a dissection as before we 
construct a new graph by adding extra vertices until all squares have log n 
vertices inside. The edges will be all possible connections between vertices in 
the same squares and all the connections of all the vertices in a square with the 
vertices in neighbor squares. Clearly, this construction is not always possible, it 
may be the case that a square has more than logo vertices, but by lemma 3, we 
can assure that whp G^ is a supergraph of Gn{r). 

Applying the dissection heuristic to G“ we get the upper bound. 



The projection heuristic. The projection heuristic is also quite natural: the pro- 
jection of each vertex into the a;-axis induces a natural linear ordering of the 
vertices. Another way to see this heuristic is to slide a vertical line starting from 
position 0 to 1 , and number vertices in the order the line touches them. 

Let Ponp) denote the cost of the layout obtained using the projection heuris- 
tic. 

Theorem 4. There exist a constant C 3 such that whp, 

^G„{r) < (can log 
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Proof. As the points are distributed uniformly, the expected length (on the pro- 
jected layout) of any edge is n times the distance of the two end points projection. 
Furthermore, the real distance is close to the average whp. 

Therefore, as the distance between the projections is bounded by the value 
of r we have that whp PG„(r) < nmr. Substituting with the right parameters we 
get our bound. 

Combining with the lower bound, both heuristics provide a O(log^n) ap- 
proximation algorithm to MiNLA on almost all random geometric graphs. These 
algorithms have an approximation ratio worse than the error bound of O(logn) 
in [RR98] although the algorithm described by Rao and Richa is extremely in- 
genions, it has the drawback of using the ellipsoid method and, thus it is not 
feasible for large graphs. On the other hand, the algorithms described here are 
simple to apply. 

6 Open Problems 

Concerning the MiNLA problem and analogously with other problems, there are 
several natural questions to ask: 

— What is the right order of magnitude of the mean value of the minimum 
linear arrangement cost over all graphs in Gn{r)7 

— Is the minimum layout cost on (r) concentrated or not around this mean? 

— Finally, is there some BHH type resnlt for this measure? In this case, one 
may ask which is the limiting constant. 

There are some subtle impediments when trying to answer these questions. 
Let us point out the main difficulties: 

— Although when we partition the graph into pieces we can compute a layout 
from optimal layouts on the disjoint pieces, the additional cost is not a linear 
function of the piece size. Furthermore, the piece size must be related under 
some conditions with the selected value of r. 

— The measure we are considering is discrete: Although the distance in the 
geometric graph has some good properties, we measure integer distances. 
Therefore, a scaling of the values will not scale the minimum cost. The only 
property that we can show is that if the transformation (scaling) creates a 
snbgraph (supergraph) of our random graph then the cost in the new scaled 
graph is upper (lower) bounded by the cost in the original graph. 

— We loose monotonicity: notice that the sequence |r(n)} that we choose to get 
random sparse geometric graphs is decreasing. Therefore adding a point may 
change the graph dramatically, so that the new graph is neither a snpergraph 
nor a subgraph of the previous one. 

On the other hand, random geometric graphs seems to be formed by small 
clnsters, that form a clique expanded in the same fashion. Therefore, it seems 
that they conld also be related to some cases of bonnded tree-width graphs. It is 
an open problem to analyze the expected tree-with of snch graphs for different 
sequences of radio. 
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Abstract. In this paper we study a neural network model of self-orga- 
nization. This model uses a variation of a Hebb rule for updating its 
synaptic weights, and surely converges to the equilibrium status. The 
key point of the convergence is the update rule that constrains the total 
synaptic weight and this seems to make the model stable. We investigate 
the role of the constraint and show that it is the constraint that makes 
the model stable. For analyzing this setting, we propose a simple proba- 
bilistic game that abstracts the neural network and the self-organization 
process. Then, we investigate the characteristics of this game, namely, 
the probability that the game becomes stable and the number of the 
steps it takes. 

1 Introduction 

How does the brain establish connections between nenrons? This question has 
been one of the important issues in Neuroscience, and theoretical researchers 
have proposed various models for self-organization mechanisms of the brain. In 
many of these models, competitive learning, or more specifically, competitive 
variants of a Hebb rule have been used as a key principle. In this paper, we 
study one property of such competitive Hebb rules. 

As one typical example of self-organization, “orientation selectivity” [WH63] 
has been studied intensively. In the primary visual cortex (arealT) of cats, there 
is some group of neurons that strongly reacts to the presentation of light bars of 
a certain orientation, which we call orientation selectivity. An interesting point 
is that in a very early stage after birth, every neuron reacts to all bars of every 
orientation. This indicates that orientation selectivity is obtained after birth; 
that is, each neuron selects one preferred orientation among all orientations. 
To explain the development of orientation selectivity, a considerable number of 
mathematical models have been investigated; see, e.g., [Swi96]. Although these 
models may look quite different, most of them use, as a principal rule for modi- 
fying synaptic strength, a competitive variant of a Hebb rule, which is essentially 
the same as the rule proposed in the pioneer paper of von der Malsburg [Mal73] , 
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the paper that first gave a mathematical model for the development of orienta- 
tion selectivity. 

A Hehh rule is a simple rule for updating, e.g., the weight of connection 
between two neurons. The rule just says that the connection between two neurons 
is strengthened if they both become active simultaneously. This rule has been 
used widely for neural network learning. Von der Malsburg constrained this 
updating rule so that the total connection weight of one neuron is kept under 
some bound. In this paper, we call this variation of a Hebb rule a constrained 
Hehh rule. He showed through computer experiments that orientation selectivity 
is surely developed with his constrained Hebb rnle. 

Since the work of von der Malsburg, many models have been proposed, and 
some have been theoretically analyzed in depth; see, e.g., [Tan90]. For example, 
a feature of various constrained Hebb rules as a learning mechanism has been 
discussed in [MM94]. Yet, the question of why orientation selectivity is obtained 
by following a constrained Hebb rule has not been addressed. Note that the 
development of orientation selectivity is different from ordinary learning in the 
sense that a neuron (or, a group of neurons) establishes a preference to one 
particular orientation from given (more or less) uniformly random orientation 
stimuli. In this paper, we discuss why and how one feature from equally good 
features is selected with a constrained Hebb rule. 

In order to simplify our analysis, we propose a simple probabilistic game 
called “monopolist game” for abstracting Hebb rules. In the monopolist game, 
an updating rule corresponds to game’s rule and the selectivity is interpreted as 
that a single winner of a game — monopolist — emerges. Then, we prove that 
a monopolist emerges with probability one in games following a von der Mals- 
burg type rule. On the other hand, we showed theoretical evidence supporting 
that (i) the chance of having a monopolist is low without any constraint, and 
(ii) a monopolist emerges even under a rule with a weaker constraint. These 
results indicate the importance of constraint in Hebb rules (or, more generally, 
competition in learning) to select one feature from equally good features. 

We also analyzed how fast a monopolist emerges in games following a von 
der Malsburg type rule. This analysis can be used, in future, to estimate the 
convergence speed of constrained Hebb rules. (In this extended abstract, some 
of the proofs are omitted. See [DWY98] for those proofs.) 

2 Von der Malsburg’s Model and Monopolist Game 

Here we first explain briefly the model considered by von der Malsburg. (Von der 
Malsburg studied the selectivity for a set of neurons, but here we only consider 
its basic component.) 

Neural Network Structure 

We consider two layer neural network. In particular, we discuss here the orien- 
tation selectivity for one neuron, and thus, we assume that there is only one 
output cell. On the other hand, the input layer consists of 19 input cells that 
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are (supposed to be) arranged in a hexagon like the ones in Figure 1. We use i 
for indicating the fth input cell, and IN for the set of all input cells. 

Stimuli and Firing Rule 

We use 9 stimuli with different orientations (Figure 2), which are given to the 
network randomly. Here • indicates an input cell that gets input 1, and o indi- 
cates an input cell that gets input 0. 

0*0 o m m o o m 
0»*0 00*0 oo»* 

oo«oo oo«oo oo«oo 
o««o o«oo ••oo 

0*0 ••O *00 

ooo ooo ooo 

00 «« 000 « *000 

o*«*o ••••• ••••• 

••oo •ooo ooo^ 

ooo ooo ooo 

ooo •oo ••o 

• •oo ••oo 0^00 

O^^^O 00^00 00^00 

OO^^ OO^^ 0 0 9 0 

ooo OO^ o^^ 

Fig. 2. Nine stimuli. 



We use tti to denote inpnt value (either 0 or 1) to the ith input cell. The 
output value V is computed as = Thp (X] > where Wi is the current 
synaptic strength between the output cell and the *th input cell. Thp (a;) is a 
threshold function that gives x —p\i x> p and 0 otherwise, where p is given as 
a parameter. 

Updating Rule 

Initially, each weight Wi is set to some random number between 0 to some con- 
stant. The weights are updated each time according to the following variation 
of a Hebb rule, which we call the constrained Hebb rule (of von der Malsburg). 

u)' = Wi + CincdiV, and Wi = in' x Ho/ w'^. 

V keIN 

Where Cinc (which is called a growth rate) and Wo {total weight bound) are con- 
stants given as parameters. The first formula may be considered as the original 
Hebb rule; on the other hand, the second one is introduced in order to keep the 
total weight within Wo- (In fact, it is kept as Wo ) 

With this setting, von der Malsburg demonstrated that the selectivity is de- 
veloped through computer simulations. Thus, it seems likely that some selection 
occurs even from uniformly random examples, and that the constraint of the 
von der Malsburg’s rule is a key for such a selection. In this paper we would 
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like to study this feature of the constrained Hebb rnle. For this, we further sim- 
plify von der Malsburg’s computation model, and propose the following simple 
probabilistic game. 

Monopolist Game 

Basic Rnle : Consider a finite number of players. Initially they are given the 
same amount of money. The game goes step by step and, at each step, one 
of the players wins where all the players have the same winning probability. 
The winner gets some amount of money, while the others lose some. 

Details : A player who loses all his money is called bankrupt. Once a player 
becomes bankrupt, he cannot get any amount of money, though he can still 
win with the same probability. (See below for the motivation.) 

Goal : The game terminates if all but one player become bankrupt. If the 
survived player keeps enough money at that point, then he is called a mo- 
nopolist. We call a sitnation when a monopolist appears a monopoly. 

Notations. We use n and n' to denote the number of initial players and that 
of the remaining (not being bankrupt) players at some fixed step, and use i, 
1 < i < n, to denote players’ indices. Throughout this paper, each player’s 
wealth is simply called a weight, and let Wi denote the «-th player’s current 
weight. Let I and Wq respectively denote the initial weight of each player and 
the total amount of initial weights; that is, Wo = nl. 

The connection of this game with von der Malsburg’s computation model is 
clear; each player’s weight corresponds to the total synaptic strength between 
the output cell and a set of input cells corresponding to one type of stimulus, 
and the emergence of a monopolist means that the network develops preference 
to one orientation. From this correspondence, it is natural to require that even a 
bankrupt player can win with the same probability 1/n, which refiects the fact 
that the probability of a stimulus of each orientation appears is the same no 
matter how neural connections are organized. 

An updating rule of players’ weights corresponds to a rule of changing synap- 
tic strength in the network. Here we can state updating rnles in the following 
way. (In the following, let io denote the player who wins at the current step.) 

^ / U!i T fine fdeci if * = and 

* ~ Otherwise. 

Here fme and /dec are the amonnt of increment and decrement at each step 
respectively, and one type of monopolist game is specified by defining /nc and 
/dec- In the following, we assume that these values are determined from Wi, 
Wity, n, and n'. From the relation to von der Malsburg’s computation model, we 
require that both /inc and /dec are 0 if = 0; that is, once a player loses all 
money, he stays forever in the 0 weight state. (In the following, we will omit 
stating this reqnirement explicitly.) 
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Now we consider the rule that corresponds to the constrained Hebb rule of 
von der Malsburg’s rule. For constant Onc it is defined as follows: 

fine — Onci and /dec ~ Onc/^ • (f) 

(Recall that n' is the number of currently remaining players.) 

Note that with this rule, the total amount of wealth is kept constant. Thus, 
in this sense, it corresponds to von der Malsburg’s rule, and we call it constrained 
rule. Note that we may also consider a similar rule such that fine is not constant 
but proportional to m*. (Similarly, /dec is also proportional to tCj.) This rule 
might be closer to the original von der Malsburg’s rule. This difference is, how- 
ever, not essential for discussing the probability of having a monopolist, i.e., for 
our discussion in Section 3. On the other hand, there is a significant difference 
in convergence speed; but roughly speaking, the difference disappears if we take 
the log of weight. Thus, we will discuss with the above simpler rule. 

3 Importance of Competition 

Here we compare three different updating rules for monopolist game, and show 
that constraint in the rule is an important characteristic to derive a monopolist. 
Prom this, we could infer that some sort of constraint, (or, more generally, com- 
petition) is important in learning rules for selecting one feature among a set of 
features through a random process. 

In the following, we consider the following three updating rules: (1) con- 
strained rule, (2) local rule, and (3) semi local rule. Below we define these rules 
(except (1) that has been defined in the previous section) and discuss the prob- 
ability P* that a monopolist emerges. 

Constrained Rule 

We show that under constrained rule, P* is 1, that is, a monopolist emerges with 
probability 1. 

A monopolist game in general expressed by an one-dimensional random walk. 
More precisely, for any i, we can express the player *’s wealth Wi as the following 
random walk. 



0 Wi 



fd ec fine 

Fig. 3. One-dimensional random walk. 

Note that the particle (i.e., the weight Wi) moves to the left (resp., to the 
right) with probability 1 — 1/n (resp., 1/n). The left (resp., right) end of the 
interval means that the player i becomes bankrupt (resp., a monopolist). Thus, 
these two ends are absorbing walls. 

In a monopolist game under constrained rule with n = 2, we have fine = 
Cinc/2 and /dec = Cinc/2. Hence, the above random walk becomes standard one 
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(see, e.g., [Fel68]), and it is well-known that the particle in such a standard 
random walk goes to one of the absorbing walls in finite amount of steps with 
probability 1. This proves that P* = 1 when n = 2. Then by induction, we can 
prove P* = 1 when n > 2; thus, we have the following theorem. 

Theorem 1 . Under constrained rule, a monopolist emerges ( in finite amount 
of steps) with probability 1. 

Local Rule 

In the constrained rule, for computing /dec, we need the number of remaining 
players; that is, weights cannot be updated locally. In general, in order to be 
competitive, an updating rule must not be local. Thus, to see the importance of 
competition, we consider here the following purely local updating rule. 

/inc ~ Qnc, and /dec ~ ^dec* (2) 

Notice that for this local rule (and the next semi local rule), the notion of 
monopolist is less clear than in the case of a game with a constrained rule, 
because the notion of “enough amount of money” is not clear. Here we simply 
consider it as Wo/2, a half of the total initial weight. That is, we regard a 
single survivor as a monopolist if his weight is more than H'o/2; hence, P* is the 
probability that the game reaches to the state where Wi^ > H'o/2 for some i\ 
and tCj = 0 for the others i. 

We first discuss one feature of this updating rule. In the following, let us fix 
Cdec = 1- Our computer experiments show that the probability of having a single 
survivor (in a reasonable amount of steps) drops rapidly when Cinc > n -b 1. The 
reason is clear from the following fact. 

Theorem 2 . Fix Cdec to be one, and consider one player’s weight. For any t, 
it increases, byt[^^ — 1^ on average, after t steps. 

Thus, if Cine > n, then it is quite likely that all players increase their weights, 
and thus no bankrupt appears in the game. On the other hand, if Cinc < n, then 
every player dies quickly, and hence, no monopolist occurs even though someone 
may become the last player. This means that the most crucial case is the case 
Cinc = n. Next we discuss P* for such a case. 

Recall that P* is the probability that, at some point in the game, all but one 
players become bankrupt and that the survivor has weight > Wo/2- Since it is 
difficult to estimate P* directly, we analyze the following probability Pi instead 
of P*: Pi is the probability that at least one player’s weight reaches to Wo/2 and 
no more than two players have weight larger than a sufficiently large value, say, 
kWo for some fc > 0. Notice that if a monopolist emerges at some point, then 
clearly, someone needs to reach H'o/2 in the game. Furthermore, it is unlikely 
that two players reach to kWo and one of them become bankrupt afterwards. 
Thus, we may regard Pi as an upper bound of P*. For this Pi, we have the 
following bound. 
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Theorem 3 . For any Wi and sufficiently large W 2 , we have 



(l ^ ) 


r +— 1 




n—l 

1 — 1 


^ 1-— 1 


n 

1 


\ W2+nJ 


' 1T2 ' 


^ W2+n) 


1 1 


^ Wi) 


1 



For example, by taking W± and W2 as ^ and kni respectively, we have 
Pi < -h - e“2 « fi + i/k)e~^/'‘ - 

which is less than 0.6 if fc = 1. On the other hand, our computer experiments 
show that P* is less than 0.5 for various sets of parameters. 

Semi Local Rule 

As a third updating rule, we consider somewhat mixture of the above two rules. 
It keeps a certain amount of locality, but it still has some constraint. This rule 
is defined as follows. 

fine — aiin(CijiC5 II'O ~ ^ ^ ’ and /dec — Cdec- (3) 

3 

That is, we want to keep the total weight smaller than Wo, where ITo is the 
total initial weight. Thus, a winner can gain Ci„c (in net, Ci„c — Cdec) if there is 
some room to increase its weight. In this case, only the winner needs to know 
the current total weight, or the amount of room to the limit Wq, and the other 
players can update its weight locally. 

Our computer experiments show that the probability P* that a monopolist 
emerges is fairly large if Cinc is large enough, say Cinc > 2n. On the other hand, 
P* gets small when Qnc is small, which is explained in the same way as local 
rule. Although we have not been able to prove that P* is large for sufficiently 
large Cinc, we can give some analytical result supporting it. 

Here instead of analyzing P* , we estimate (i) the average number of steps until 
all but one players become bankrupt, and (ii) the average number of steps until 
the total weight (which is initially Wq) becomes Wo/2. Let and Pwo-i-Wo /2 
denote the former and the latter numbers respectively. We prove below that 
T„_>i is smaller than Prvo-i-rvo /2 if Wo is large enough. This means that it is 
likely that at the time when all but one players become bankrupt, the total 
weight, which is the same as the survivor’s weight, is larger than H'o/2, that is, 
the survivor is a monopolist. 

Theorem 4 . Fix again Cdec to be one. For large n, if I > (In Q)n{n — 2) and 
Cinc > 2 n, then we have Two^Wol2 > T„^i. 

4 Efficiency Analysis 

In this section we discuss how fast a monopolist emerges in games with con- 
strained rule. We estimate an upper bound on the average number of steps 
needed for monopoly to emerge, and we give some justification (not a rigorous 
proof) supporting that it is 0(n^ lnn(I/cinc)^). 




314 C. Domingo, O. Watanabe, and T. Yamazaki 

We start with some definitions and notations that are used through the sec- 
tion. Here we modify our monopolist game and define a variation of it. Let gameg 
denote the original monopolist game and let gamej^ denote a variant of gameg in 
which no bankrupt player can win. That is, in gamej, the winning probability 
of the remaining players is 1/n' instead of 1/n. As we will see game^ is useful 
for induction and it is easier to analyze. 

These two game types are defined on different probability spaces. Let us 
define them more precisely. For all two game types, (the execution of) a game 
is specified by a game sequence, i.e., a string from {!,..., n}* that defines a 
history of winners. (Precisely speaking, we also need to consider infinite strings; 
but as we will see below, we may ignore infinite strings.) We say that a game 
sequence x kills a player i if Wi becomes 0 (or, negative) in the game following x 
just after the end of x, and we say that x derives a monopolist if the second last 
player is killed and monopoly emerges just after x. We say that a game sequence 
X is valid (resp., strongly valid) if it derives a monopolist and no prefix of it 
derives a monopolist (resp., x contains no indices of previously killed players). 
Note that the meaning of these notions may vary depending on game types. Now 
for any n, let (resp., Yn) be the set of game sequences for n player games 
that are strongly valid w.r.t. gameg (resp., valid w.r.t. game^). For each x in 
Xn, its probability Pr{a;} is On the other hand, the probability Pr{?/} of 

y EYn depends on thejiumber of remaining players, and it is rather complicated. 
(We omit specifying Pr{y} because it is not important for our discussion.) Note 
that Xn and Yn are all prefix free. Also it is not hard to show that Pr{X„} and 
Prjljj} are one. (For example, Pr{X„} = 1 follows from Theorem 1 .) Therefore, 
we may regard and as the probability spaces of the corresponding games, 
and we do not have to worry about infinite strings. 

We denote by T(n, h,. ■ ■ , In) (resp., Ti{n, h,.. In)) the number of steps 
needed until monopoly emerges in gameg (resp, game^) with n players and initial 
weight Ji, . . . , When all the weights are equal, we use the simpler notation 
T{n,I). Our goal is to get some upper bound on E[T(n,/)]. But instead, we 
will analyze an upper bound on E[Ti(n,/)], which gives us an upper bound on 
E[T(n,/)], as the following lemma guarantees. 

Lemma 5 . There exists Ci such that for any sufficiently large n and any I, we 
have E[T(n,/)] < CinE[Ti(n, /)]. 

Now we analyze the convergence speed of game^. Eor our analysis, we s- 
plit a game execution into stages where each stage is a part of the game un- 
til some amount of players become bankrupt. More specifically, we denote by 
ti{n, Ii, . . . , In) the number of steps needed in a game with n players and initial 
weights 7i, . . . , /„ until at least 1 player becomes bankrupt. The following lemma 
relates the two terms Ti (n, 7i , . . . , 7„) and ti{n,Ii, . . . , In)- 

Lemma 6 . For any n and 7i, . . . ,7„, there exists a constant C 2 , C 2 > 1, and 
weights 7{ , . . . , I'n-c^ such that the following inequality holds. 

E[Ti(n, 7i, . . . , In)] < E[ti(n, 7i, . . . , 7„)] + E[Ti(n — C2,l [,. . . , 7(j_g2)]- 
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Proof. Let Y C be the set of all valid game sequences y such that the number 
of players becomes strictly smaller than n for the hrst time just after y. By 
definition of E[Ti In)], we have the following equality. 

E[Ti(n,/i,...,/„)] = ^ Pr{a;} • |a;| = ^ Pr{t/ 2 ;}(|y| -I- | 2 ;|). 

xeY„ ^eYr,,y€Y 

x=yz 

Notice here that we can split Pr{yz} in two factors, Pr{y} and Pr^j^;}, where 
Piy{z} determines the probability of 2 ; after the game follows y. Also note that 
the set Yy of strongly valid 2 ; depends on y. Thus, we can rewrite the above 
expression as follows. 

= • lj/l + H Y My}Pi-y{ 2 ;} ■ \z\ 

y€Y z^Yz y€Y z€Yz 

Y 1 E + Y ( Y • 1^1 

y€Y J z€Yz y€Y \z€Yz 

= E[ti(n, ^ Prj,{ 2 ;}+ ^Pr{y}E[Ti(ny,7ji(j,),...,/j„^(j,))] 

z€Yz v&Y 

< E[ti(n, 7i, . . . , In)] + E[7i(n - c' I'n-c')]- 

where the values of Uy and (j,) , . . . , 7j^^ (^y) are determined by the result of the 
game following y. On the other hand, c' and 7' are chosen so that the value of 
E[Ti(ny, 7jj(y), . . . , 7j^^ (j/))] is maximized. These values always exist since even 
if there is an infinite number of game sequences y that appear on the summation, 
there is only a finite number of possible values for Uy (since riy mnst be between 
1 and n - 1 ) and li.(y) (since hj(y) = In). 




By this lemma we can use induction for bounding the expected value of Ti . 
Recall that when analyzing (n, 7i, . . . , 7„), by the way it is defined, no player 
becomes bankrupt, and thus, the amount of decrement is fixed to Cinc/u. Thus, 
gamej until at least one player becomes bankrupt is regarded as a n-dimensional 
random walk, which is much easier to analyze. In fact, we can use the following 
lemma. 

Lemma 7. Let X be a random variable that is 1 with probability 1/n and 0 
with probability 1 — 1/n, and let S = Xi + . . .+Xt, the sum of the outcomes oft 
random trials of X. Then, for some constant a > 0, the following holds for any 
t and n. 



Pr 




> 1/3. 



Proof. We estimate the probability that the statement of the lemma is false and 
show that it is less than 2/3. That is, we upper bound the following probability. 




316 



C. Domingo, O. Watanabe, and T. Yamazaki 



Pr 




The first term in the sum is bounded by 1/2; see, e.g., [JS72]. Let s be the 
smallest integer such that s > tin — a^jtjn. We calculate, by using Stirling’s 
approximation, the second term of the above sum as follows. 



Pr 



t ^ o t 

— > S > a\ — 

n n \ n 




^ /A (n -!)*-» 
Vv n* 

-j— « ' 



t / t V — iV (n — 1)* * 
2m{t — i) — \ * / n* 

<v / ^ { tjn-lW f t-i y 




Also routine calculations show that (j^) is always less than 1 for 

s <i <t/n and that this factor is maximized when i = t/n . From this by simple 
calculation, we obtain the desired bound with a = i/tt/12. 

Now we are now ready to make the following claim^. 

Claim. 

E[T(n,/)] = ofnlnnf^^ 

\ V Cine 




Justification. We start with estimating E[ti(n, /i, . . . , /„)] by using the above 
lemma. For a given t and for any i, let t* be the number of times that player i 
wins within t steps. Then Wi, the weight of player i, in game^ is expressed as 
follows. 



nJi — Fi T Cinc^i ^dec^ ^ “b Cine 




(For simplifying our notation, we use c to denote Cine in the following.) 

Moreover, since game^ until at least one player becomes bankrupt is regarded 
as a n-dimensional random walk, we can use Lemma 7 to show that the following 
event happens with probability bigger than 1/3. 



Wi = li + clti- 



n 



li C \ 



n 



— a 



n n 



= L — ca 



^ We do not have a rigorous proof for this result, and for this reason we stated it as a 
claim. 
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Therefore, with pro babil ity more than 1/3, the weight of player i becomes 
zero or negative if ca^jtjn > li, that is, t > {Iilca)^n. Now sort players by 
their initial weights, and define P to be the set of the first (i.e., the smallest) 
n/2 players. Since the total weight is Wo (at any step), all players in P have 
weight at most 2Wo/n and therefore, 



Pr{ tCj < 0 in to = « 



2Woy 

nca ) 



steps \ i e P} > 



1 

3' 



Moreover, «/we can assume that each player in P become bankrupt indepen- 
dently, we also have the following probability: 



Pr{ There exists i E P, such that tc* < 0 in to steps } > 



1 - 




n/2 



From this observation, it is reasonable to bound E[t 2 (n, h,. . In}] by csto 
for some constant C 3 since for most of the valid game sequences (a 1 — (2/3)”^^ 
fraction of them) this bound holds. 

Now combining the above lemmas and the obtained bound, we have 



E[Ti(n,/)]< 

< 

< 

< 



E[ti(n, /i, . .^. , /„)] + E[Ti(n — C2, , I'n-co)] 

+E[Ti(n-C2, 



C 311 



211^0 Y 

nca J 



+ C3{n - 1 ) 



2 Wo 



+E[Ti(n-c',7(,...,/;_,0] 



(n — l)ca 



Wo 



ca 



In 



cslnn I — 1 < C4lnn I — 1 , for some constant C4. 



Prom this and Lemma 5 , we obtain the desired bound. 
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CONSTRUCTIVE BOUNDS AND EXACT EXPECTATIONS 
FOR THE RANDOM ASSIGNMENT PROBLEM 

DON COPPERSMITH AND GREGORY B. SORKINt 



Abstract 



The random assignment problem is to choose a minimum-cost perfect matching in a complete 
n X n bipartite graph, whose edge weights are chosen randomly from some distribution such as 
the exponential distribution with mean 1. In this case it is known that the expectation does not 
grow unboundedly with n, but approaches a limiting value c* between 1.51 and 2. The limit is 
conjectured to be c* = tt^/6, while a recent conjecture has it that for finite n, the expected cost 
is M* = Er=i !/*'• 

By defining and analyzing a constructive algorithm, we show that the limiting expectation is 
c* < 1.94. In addition, we generalize the finite-n conjecture to partial assignments on complete 
m X n bipartite graphs, and prove it in some limited cases. A full version of our work is available 
as [CS98]. 



1. Introduction 

The assignment problem, a special case of the transportation problem, is to find a minimum cost 
perfect matching in an edge-weighted bipartite graph. The expected cost of a minimum assignment, 
in a randomized setting, seems first to have been considered by Donath [Don69]. If A is a complete 
nxn bipartite graph whose edge weights are independent and identically distributed (i.i.d.) random 
variables (r.v.’s) in the interval (0,1), Donath observed through simulations that, as n — ^ oo, the 
cost appeared to tend to some constant c* between 1 and 2. 

Walkup proved that limsup„_,QQ lEA* ^ 3, by showing that for large n a certain “2-out” subgraph 
containing only edges of small weight is likely to contain a perfect matching [Wal79]. The method was 
made constructive (and efficient) by Karp, Kan, and Vohra [KKV94]. Karp [Kar84, Kar87] showed 
that EA* ^ 2, by analyzing the problem’s linear program (LP) and its dual; the method was quickly 
generalized by Dyer, Frieze, and McDiarmid [DFM86]. 

Lazarus proved that lim infn^oo EA* ^ 1 -|- 1/e [Laz79, Laz93]. His method can be viewed as 
exploiting the dual LP, but is also easy to understand naively; we will develop the result later. 
Lazarus’s methods were extended at about the same time by Goemans and Kodialam [GK93], who 
showed a lower bound of 1.41, and by Olin [01192], who showed a lower bound of 1.51. 

On a different front, Aldous [Ald92] proved that as n ^ oo, A* converges to a well-defined limit 
c*, in distribution as well as expectation. Aldous also observed that as n oo, the uniform (0,1) 
distribution and the exponential distribution with mean 1 (probability density p{x) = exp(— aj)) 
are equivalent. Intuitively, a good assignment uses only small values, so only the distribution’s 
density near 0 is relevant; these two distributions both have density 1 there. Use of the exponential 
distribution instead of uniform vastly simplifies the results of all the works cited, both because the 
minimum of n mean-1 exponentials is a mean-l/n exponential, and because an exponential random 
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variable X conditioned hy X ^ c has the property that X — c is an (unconditioned) exponential 
random variable. 

Most tantalizingly, Mezard and Paris! [MP85, MP87] used the “replica method” of statistical 
physics to calculate that c* = tt^/ 6; the method is not mathematically formal, but its results have been 
borne out in a variety of cases. Paris! [Par98] made the further conjecture that for an n x n instance 
A with i.i.d. exponential costs (the distribution is critical in the finite case ), EA* = Y^=i 1/*^- Since 
it is well known that proof of this conjecture would immediately validate the 

replica calculation. 

In this paper we describe an algorithm whose analysis yields an upper bound of c* < 1.94. Also, 
we generalize Parisi’s conjecture, and prove it in a number of cases. 

2. Lower bounds 

We begin with a review of Lazarus’s 1 + 1/e lower bound, both to introduce some basic principles 
and because its construction is the starting point for our assignment algorithm and upper bound. We 
will need two simple properties of the exponential distribution: 

Property 1. The minimum of a collection ofn i.i.d. exponential r.v. ’s with mean 1 is an exponential 
r.v. with mean 1/n. 

Property 2. For an exponential r.v. X with parameter A (density p{x) = for any s,t^0, 

Pr{X >s + t|X >t} = Pr{X > s}. (That is, conditional upon X ^ t, the distribution of X —t is 
again exponential, with the same parameter.) 

All existing lower bounds on expected assignment cost are based on feasible (not necessarily op- 
timal) solutions to an assignment problem’s dual LP. The key is the transformation of an instance 
A: 

Lemma 3. For real n -vectors u and v, if A — ul^ — Iv^ = A' (that is, Uij — ui — Vj = oL ), then 

Proof. Since any assignment cr selects precisely one value from each row and each column, 
cost(A, cr) = cost(A', a) + minimum-cost assignments for A and A' are 

achieved by the same <r, and A* = A'* + X)”=i ^ 

Corollary 4. If A — ul‘^ — Iv^ = A' , and A! is elementwise non-negative, then A* ^ + • 

Lemma 5. EA* ^ 1. 

Proof. Let u be the vector of row minima, and v = 0. By Property 1, Wui — \jn. Applying Lemma 3 
and Corollary 4, and taking expectations, EA* = EA'* + E^ Ui~^ n- 1/n =1. □ 

Theorem 6 (Lazarus), lim inf„_.oo lEA* ^ 1 + 1/e. 

Proof. As above, subtract the row minima Ui in A to give A' and conclude that EA* = 1 + EA'*. 
By Property 2, subtracting Ui from values Oij other than the row-i minimum produces a collection 
of i.i.d. exponential r.v.’s aL = Thus A' is (like A) an i.i.d. exponential matrix, except 

that in each row it has a single zero, in a random column. 

In the large-n limit, the probability that a column contains no zeros is 1/e. Subtract the column 
minima Vj in A' to give A". If column j contained a zero, Vj =0; otherwise (by Property 1) Vj has 

^This argument, based on the exponential distribution, is easier than Lazarus’s for the uniform distribution. The 
memoryless property of the exponential means that when row and/or column minima are subtracted from an i.i.d. 
exponential matrix, the result is again i.i.d. exponential (outside of the zeros); this “self-reducibihty” is most useful. 
(The matrix A and its “residue” , after subtraction of the row and column minima, are far from independent of one 
another, but that is not a concern.) 
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exponential distribution with mean 1/n. So in the large- n limit, 
and by Lemma 3, 

EA* = EA"* + Ej2ui + 'Ej2vj ^ l + l/e. 



n - 1/e - 1/n = 1/e, 



□ 



Olin’s improved bound liminf„_,oo < 1-51 takes off from inequality (2), replacing EA!'* ^ 0 
with a stronger bound by reasoning about groups of rows whose zeros all lie in the same column. 



3. Outline of Sections 4-6 

The lower bounds are based on constructing feasible solutions to the assignment problem’s dual 
LP; there is a certain satisfaction in the constructions, and the sense that their continuing incremental 
improvement might result in lower bounds arbitrarily close to the true value. By contrast, the upper 
bound has rested at exactly 2 for over a decade, and is not constructive. 

We derive a smaller upper bound by defining and analyzing an assignment algorithm which, applied 
to a large i.i.d. exponential matrix A, produces an assignment with expected cost less than 1.94. 
There are three key points. (1) We begin with a partial assignment of size about .81n implicit in 
Lazarus’s lower-bound construction. (2) To complete even a partial assignment of size n — 1 to a 
complete assignment in the naive manner would increase the assignment cost by 1, which when we 
are gunning for 7T^/6 is intolerable. However, we show how an (n — l)-assignment can be completed 
to an n-assignment quite simply, with additional cost o(l). Dogged extension of the technique allows 
us to complete an initial assignment of size about .8 In to an n-assignment with additional cost of 
only about 0.56 altogether. (3) In both the initial construction and the successive augmentations, 
the “working” matrix will consist entirely of zeros and i.i.d. exponential values. 



4. Initial assignment 

Begin with an i.i.d. exponential matrix A, and — following Lazarus’s lead — form a “reduced” 
matrix A' according to the following algorithm: 



Algorithm 0 

Input: An n x n matrix A of i.i.d. exponential elements. 



For i = 1 .. .71 : 

Subtract from row i its minimum value Ui . 

For j — 1 . . . n: 

Subtract from column j its minimum value Vj . 

For i— 1 . . .n: 

For j = 1 . . . n : 

If element (i,j) is zero, and has other zeros in its row and in its column, 
replace it with an exponential r.v. 



A typical reduced matrix is depicted in Figure 1, and is described by the following theorem: 

Theorem 7. For n asymptotically large, Algorithm 0 reduces an nxn matrix A of i.i.d. exponential 
r.v. ’s to a matrix AA in which 

• there is at least one zero in each row and in each column; 

• a zero may share either its row or its column with other zeros, hut not both; 

• the collection of nonzero elements are i.i.d. exponential random variables; 



and, with probability 1 — o(l), 
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Figure 1 . The zero structure of the matrix resulting from Algorithm 0: subtraction 
of row and column minima, and replacement of selected zeros (such as in the spot 
marked with an “x”) by fresh random variables. 



• the fraction of columns having k > 1 zeros is asymptotically equal to 

col(A;) = Poisjt(e“* ); 

• the fraction of rows having A; > 1 zeros is asymptotically equal to 

Tow{k) = (1 - p*) Poisj,(l/e) + (p*) Poisjfe_i(l/e), 



where p* = (e 



0 /( 1 ' 



)/ and 



• the fraction of zeros unique in their row and column is asymptotically equal to 



f(l) = col(l) 



— 1 I ^ — e" 
€ + € 



+ 



1 — e” 



.4914. 



Proof. The first two assertions hold by construction: A zero is generated in every row and every 
column, and zeros are removed when they share their column and their row with others. The third 
assertion follows from Property 1. 

When row minima are subtracted, the number of zeros contained in any column j is the sum 
of n Bernoulli (1/n) random variables; this has binomial distribution B{n,l/n), which (as n 
oo) converges in distribution to Pois(l). Immediately, the expected number of columns having j 
zeros is nPoisj(l). The actual number is tightly concentrated about this expectation, by Azuma’s 
inequality [McD89]: all statements in the remainder of the proof will implicitly be “almost surely” 
and “almost exactly”. 

From the above, nPoiso(l) = n/e columns contain no row-induced zeros, so subtracting column 
minima introduces n/e new zeros. They fall in random rows, with a density of 1/e new zeros per 
row, so the number of column-induced zeros in each row is distributed as Pois(l/e). 

Consider the number of columns with k > 1 zeros. A column only acquires multiple zeros in the 
row phase; a zero is removed if any of the n/e column zeros falls into the same row and if the zero is 
not the last remaining in its column. Forgetting this exceptional “and” only results in miscounting 
columns with no or one zeros. Then the probability that a zero is allowed to remain, is the probability 
that no column zero falls into its row, which is Poiso(l/e) = . Thus a column may be modeled 

as having n slots, into each of which a zero is placed w.p. 1/n, and allowed to remain w.p. ^ ; this 
is equivalent to just placing zeros w.p. e“® ^/n, which for n large is the Poisson process Pois(e“® ^). 
So the fraction of columns containing fc > 1 zeros is col(fc) = Poisj,(e“® ^). 

Counting rows with k > 1 zeros is a little bit trickier but follows a similar line. (See [CS98] for 
details.) 
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Figure 2. At left, the naive completion of an (n — l)-assignment adds expected 
cost Wonn = 1 . At Center, adding the cheapest pair cun + Oni and deleting an adds 
expected cost 0(\/l/n). At right, with s = [V^J , finding the s smallest elements 
in column n (depicted without loss of generality as elements 1 . . .s), then choosing 
the smallest corresponding element j of row n, gives a pair with ^ -y/2/n. 



^ S cheapest 



This leaves rows which neither have 2 or more zeros, nor are associated with a column having 
2 or more zeros. These are counted by row(l) = 1 — k co\{k) — which is a 

straightforward calculation with Poisson distributions. □ 

Corollary 8. For asymptotically large graphs, the size of a maximum assignment contained in the 
bipartite graph Bi formed by edges which have minimum cost for either their row or their column is 
almost surely almost exactly n times 2 — (approximately .8073nJ. 

Pittel and Weishaar [PW98] compute the size of a maximum assignment in Bk, the graph formed 
of the k cheapest edges out of each row and column, using entirely different methods; for B\ they 
find a definite integral whose numerical evaluation agrees with our closed-form expression. 

Corollary 8 means that at a cost of 1 -|- 1/e (in row and column dual variables), Lazarus’s con- 
struction produces an assignment of size about .8 In consisting entirely of zero- weight edges, with 
i.i.d. exponential edge weights elsewhere. How cheaply can this assignment be completed? 



5. Augmenting an m-AssiGNMENT 

First consider a simpler question. Suppose an n x n matrix A has n — 1 zeros along its diagonal 
(a zero-cost assignment of size n — 1), and i.i.d. exponential entries elsewhere. How cheaply can this 
assignment be completed? The simplest way to complete the assignment is to assign row n to column 
n. This adds 1 to the expected cost, since lEa„„ = 1; as our goal is to find an assignment of cost less 
than 2 (ideally, 7 t^/ 6), an additional cost of 1 is far too much. 

A better way (see Figure 2) is to remove from the matching some edge an, and add the two 
edges ain and Oni", this is augmentation by the 3-edge alternating path an, ain. There are 
n — 1 ways of doing this (choosing i G l,...,n— 1), and the cheapest will have expected cost 
IEmini(ai„ -|- a„j) = 0(l/v^). 

We can implement the same principle a little differently, fixing a value s, choosing the s smallest 
elements in the unmatched (or “free”) column, then selecting the cheapest corresponding element in 
the unmatched row. The s smallest “column elements” have average expectation (l/n-|-. . .+s/n)/s = 
(s + l)/(2sn), and the smallest of the corresponding s “row elements” has expectation 1/s, so for 
s X [V^J , the expected added cost is asymptotically •y/2/n. 

More generally, suppose only m of the n rows and columns are matched, each at cost 0. Again 
we can augment to an (m -|- l)-assignment by constructing a 3-edge alternating path from the first 
unmatched row to one of the unmatched columns, as per the following algorithm: 
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Algorithm 2 

Input: matrix A; m-assignment of zero-weight edges, with m > n/2; 
positive integer s ^ m/(n — m). 

For each unmatched column k = m + 1 . . .n in turn: 

Choose the s cheapest elements dij. among rows i G 1 . . . m. 

For each row i chosen: 

Define parent (f) = k. 

Eliminate row i from future choices. 

Let S be the set of row indices i chosen. (|5| = s(n — m).) 

Let j = argminjg 5 {aTO+i,j}, the index of the cheapest of the s(n — m) 
elements in row m -|- 1 and columns 5. 

Add edges Om+ij and parent(i) to the assignment, and remove edge a^j. 

Let u = 0, except for Wm+i = 0‘m+ij- 
Let V — 0, except for Uparent(j) — ^hP^rent(j) • 

Let A' = A-ul^-lv^. 

Replace any negative elements aC with fresh exponential r.v.’s. 

Replace element with a fresh exponential r.v. 

Lemma 9. Given an n x n matrix A with elements an = 0 for i= and i.i.d. exponential 

random elsewhere, with n — m ^ 0. Then Algorithm 2 produces A', u, and v such that: A ^ 
A' -k ul^ + Iv^; 



and the elements of A! are i.i.d. exponential except on an (m -|- l)-assignment all of whose edge 
weights are zero. 

Proof. The first claim, that A ^ A' + ul^ -h Iv^, is immediate by construction. For the second 
claim, 

^ ^ '^j — “h Oj^parent(j‘)* 

As the smallest of s(n — m) elements, lAm+i,/ = l/s(n — m). Furthermore, the smallest ele- 
ment occurs in random position: j G 5 is chosen uniformly at random, and so the expectation of 
the chosen column element aj',parent(j) is merely the expected mean of all the elements aj,parent(j) 
generated by the algorithm. The first s of these elements chosen, from amongst rows l...m, 
have expectations Ijm,... ,s/m; the next s chosen (from amongst s fewer rows) have expecta- 
tions l/(m — s),. . . , s/(m — s); and so forth, with the last column’s elements having expectations 
l/(m — (n — m— l)s), . . . , s/(m — (n — m — l)s). The mean is 

F - 1 1 5+1 1 , 

jiParentG*) ^ 2 m — ks ^ 2 (n — m) Jq m — xs 

Thus 



Finally, we must show that apart from the m + 1 zeros, the elements of A' are i.i.d. exponential 
r.v.’s. Before dm+i,} is chosen, row m+ 1 is unrevealed. Its revelation can be simulated by: choosing 
j G S uniformly at random; setting dm+ij equal to an exponential random variable with mean 
1/|5|; for the remaining j G S — j, setting 0 ^+ 1 , j = cim+ij + Xj, where the Xj are independent 
mean 1 exponential r.v.’s; and for j ^ 5, setting Om+i,j = Xj, where the Xj are again independent 
mean-1 exponentials. Subtracting am+i,j from each am+i,j , and replacing negative values with fresh 
exponentials, results in an i.i.d. exponential n-vector. 




Ui + Ey^ Vj ^ — — - — ^ ^ In 

s(n — m) 2s[n — m) 




Ui + E^^ Vj ^ 



1 5+1 , 

^ + -—7 r In 

s(n — m) 2s(n — m) 
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The above simulation begins by choosing j, which may be done by specifying in advance, at 
random, the time — from 1 to s(n — m) — at which j was added to 5; this gives us the ability 
to stop generating the set S once j is produced, say while choosing from column k. It is crucial 
that all columns were equally represented, with s choices each, for this ensures that k is chosen 
uniformly at random from m+ 1, m + 2, . . . , n, and in turn that the columns k ^ k not chosen retain 
their unbiased i.i.d. exponential distribution. (Otherwise, for example, high-probability selection of a 
column k with particularly small entries would bias the untouched columns to have large entries.) In 
column k, elements not probed (because their rows were unmatched, or were selected for an earlier 
column) remain random. Probed elements smaller than the chosen ^ become negative when j is 
subtracted, and are replaced by fresh r.v.’s. And probed elements larger than become unbiased 
exponentials when is subtracted, independent of one another and of the rest of the matrix. □ 

Corollary 10. An nxn matrix A of i.i.d. exponential r.v. ’s, with n large, has an assignment whose 
expected cost is less than 2.92. Therefore, c* < 2.92. 



Proof. Given a matrix which initially has an all-zero assignment of size m, apply Algorithm 2 re- 
peatedly (n — m times), producing vectors u(m-|- 1) and v(m-b 1), u(m-b 2) and v(m-b 2), . . . , up 
to u(n) and v(n). By Lemma 9, and choosing s at each iteration to minimize the cost bound added, 
the total weight of these vectors satisfies 



^ E(^Ui(m') + ^r)j(m')) 



^ E _ 

m'=m+l 



+ 



s -I- 1 



s£Z+ \s(n — m') 2[n — m') 



■In 



m 



i' — s(n — m') 



rn.in|^2-- + -l+t-ln( , f 

Jjyi sez+ ( s[n — m'j 2[n — m')s \m' — s[n — m') ) J 



and, changing variables to a; = 1 — m! /n, 




For m/n = bi — 2 — e^‘ f».81, using a combination of numerical techniques (for x 

large, say a; > .01) and algebraic ones (for x small), yields 

J2m'=m+i + Y^Vj {m')) < 1.55. That is, a matrix A' which is i.i.d. exponential except 

for an all-zero assignment of size (1 -|- o(l))5in, has assignment cost EA'* < 1.55 -|- o(l). Applying 
Algorithm 0 to an i.i.d. exponential matrix A produces a matrix which includes an all-zero assignment 
of size (l-|-o(l))6in, some additional zeros, and i.i.d. exponential r.v.’s. Replacing its non-assignment 
zeros with new exponential r.v.’s produces a matrix A' as required, and with EA* ^ E4'* -|- 1 -|- 1/e. 
Thus for a random i.i.d. exponential matrix A, EA* < 1 -|- 1/e -|- 1.55 < 2.92. □ 



To our knowledge, this is the first proof of c* < 3 that is based on algorithmically constructing an 
assignment of such cost. 



6. An upper bound less than 2 



Theorem 11. An nxn matrix of i.i.d. exponential r.v.’s, with n large, has an assignment whose 
expected cost is less than l.Qf. Therefore c* < 1.94. 
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Proof. The proof is again based on an algorithmic construction, which unfortunately we will only 
be able to sketch here. (See [CS98] for details.) In the previous algorithm, each unmatched “free” 
column generated s “child” columns; row m + 1 was assigned to one child, as the first link of a 3-edge 
alternating path ending at the child’s parent. 

Instead, we now let each free column, in turn, generate a single child. (As before, the children 
must be distinct, and disjoint from the parents.) Add the list of children to the end of the list of 
parents, doubling the number of free columns. Repeat this “doubling” procedure a parametrized 
number of times, which will in fact be until a constant-bounded fraction of the columns are free. 
Assign row m -|- 1 to the cheapest free column, as the first link of an alternating path, and continue 
the alternating path by reassigning that column to its parent, the parent to the grandparent, and so 
forth. (The path lengths are typically, and at most, logarithmic in the ratio of n to the number of 
initially free columns.) 

As before, the first assigned edge’s value Orn+i,i is subtracted from row m-|- 1, and the other newly 
assigned edges’ values are subtracted from their respective columns. The analysis follows the familiar 
lines: estimating the values of the newly assigned elements, and proving that the elements of the 
residual matrix are (outside of its m-|- 1 zeros) again i.i.d. exponential r.v.’s. 

To push the expected cost below 2, two other tricks are needed. The first is to do with the initial 
assignment and the initial set of free columns. After Algorithm 0 (refer to Figure 1), assign all rows 
with unique zeros to their corresponding columns, and to each column with two or more zeros, assign 
the first corresponding row. The later rows for such columns need to be assigned, and there are about 
.19n of them. The initial free columns are those corresponding to rows with several zeros and there 
are about .36n of them; it is an advantage that they outnumber the rows needing assignment. A row 
with several zeros, while not explicitly assigned, is not a concern: when all but one of its columns 
has been consumed, we assign it to that last one. 

For the second trick, suppose that a column j initially contained two zeros, in rows ii and * 2 ; that 
row ii has been assigned to column j, and that we are now trying to assign row 22 . Go through the 
column-doubling procedure as usual, but now instead of selecting as the first edge of the alternating 
path the smallest value in row *2 among the free columns, take the smallest value in rows ii or 12 
among the free columns. If the value falls in row 12 proceed as usual; if it falls in row ii, go through 
the same alternating path construction for row ii, and assign row i 2 to column j. The virtue of 
the trick is that because we are choosing from twice as many values, the expectation of the first 
element of the alternating path is halved. The price we pay is that whichever row did not contain the 
smallest element is now biased, must be removed from consideration in all the remaining assignment 
augmentations, and, by thus reducing the options slightly, increases future augmentation costs. The 
tradeoff turns out to be advantageous, and the trick is employed for the last zero-bearing row of any 
column. □ 



7. Exact optima 



Parisi’s conjecture [Par98] that EA* = seems to offer the best “handle” on the assign- 

ment problem. To that end, we have verified it to the extent we could, and in the process generalized 
it further: 



Conjecture 12. Given anmxn array of i.i.d., exponentially distributed random variables Oij with 
mean 1, select k ^ min(m, n) elements, no two in the same row or column, such as to minimize the 
sum of those variables. fA k- assignment.) The expected value of this sum is 



E 

»,j>0, i+j<k 



1 

{m- i){n- j)' 
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We will show that the conjecture is true in the following four cases: A; = 1; A = 2; A = m = 3; 
and k = m = n — A. We will also show that in the case k = m = n, the conjectured sum reduces to 
Sfci S’greeing with Parisi’s conjecture. Unfortunately, it seems difficult to extend these results. 

Notation. By F(k, m, n) we denote the expected value of the minimal fe-assignment on an m x n 
array of i.i.d. exponential random variables with mean 1. By G{k, m, n) we denote the expected 
value of the same problem, except with an = 0. 

A few lemmas will be useful. 

Lemma 13 . We have F(k, m, n) = G(A, m, n) + k/mn. 

Proof. Start with an m x n array A of i.i.d. exponential variables. Its global minimum has expected 
value 1/mn. Subtract this global minimum from all entries, to produce an m x n array B of i.i.d. 
exponential variables except for one entry of 0, which we may take to be 6n = 0. An optimal 
A-assignment on A differs from one on B by the addition of the global minimum to each of the k 
selected elements. Then EA* = WB* + k/mn. □ 

Lemma 14 . For integers k ^ m ^ n, consider a nonnegative m x n matrix A with an > 0, where 
row 1 has y zero entries and column 1 has z zero entries, with y + z^k. Form the matrix B which 
agrees with A except that 6n = 0. Then the optimal k-assignments have equal values: A* = B* . 

Proof. Suppose an optimal A-assignment of A uses element an. We claim that, among the remaining 
k — 1 elements of the assignment, exactly y are in columns j such that aij = 0, since otherwise we 
could replace an in the assignment by one of the aij — 0 from an unused column, obtaining an 
assignment with a smaller value. Similarly, z elements are in rows i such that an = 0. Since 
y+z > k — 1, one of the k — 1 elements aij of the assignment satisfies both conditions: Oji = a\j — 0. 
Replacing an + a^j by an + a\j = 0, we would obtain another A-assignment with a smaller value. In 
either case optimality is violated. So any optimal fc-assignment for A avoids an, and thus has the 
same value as an optimal fc-assignment for B. □ 

We will use this lemma to insert zeros into matrices. 

Theorem 15. Conjecture 12 holds in the cases k = l;k = 2;k = m = 3; and k = m = n = A. 

Proof. 

Case fc = 1: The smallest among mn i.i.d. mean-1 exponential variables has expected value 

1/mn. 

Case k — 2: We appeal to Lemma 13, and show that for an m x n matrix B in which bn — 0 

and all other bij are i.i.d. exponential with mean 1, the smallest 2-assignment is G(2, m, n) = (mn — 
l)/[mn(m— l)(n— 1)]. 

Any 2-assignment of the B matrix must involve at least one entry from rows i yt 1 . The minimum 
such entry d has expected value l/[(m— l)n]. With probability (n— l)/n, this entry occurs outside 
the first column, so that 0 + d = d is the smallest assignment. 

With probability 1/n, the entry occurs in the first column. But some entry in the minimum 2- 
assignment must come from a column other than the first. So in this case we construct the m x (n — 1) 
matrix c = {cij, 2^y^ m}, where 

_ j hj ifi= l,jG {2,3, ... ,n} 

^^^-\bij-d iff7tl,jG{2,3,...,n} 

Again the entries Cij are i.i.d. exponential with mean 1, so the minimum element e = Cij has expected 
value l/[m(n — 1)]. If this minimum occurs in row k = 1, then the minimal 2-assignment in the 
B matrix is d + e. (The two entries d and e are a valid 2-assignment, and no 2-assignment can be 
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smaller.) If the minimum occurs in row k ^ 1, then the minimum 2-assignment of the B matrix is 
hi + hj = (0) -I- (d -I- e). Summing, 

1 ^ 1 _ (mn — rn) -\- (m — 1) _ rnn — 1 

(m — l)n nm(n — 1) mn(m — l)(n — 1) mn(m — l)(n — 1) 



G(2,m,n) = E(d)-k-I(e) = 
n 



and 



^(2, m, n) = G(2, m, n) -|- 



3mn — 2m — 2n+l 



1 1 1 

H 7 7T + ■ 



mn mn(m— l)(n— 1 ) mn m(n — 1 ) (m — l)n 



Case k = m = 3: 

Let Ui^hi he the least element in row i, for i = 1,2,3. We have three possibilities. 

With probability (n — l)(n — 2')jr? , all the columns h{ are distinct, and the assignment is just 
^ ■ In this case the expected value is clearly 3/n. (The values of the row minima are independent 

of their relative positions.) 

With probability 3(n — \')jr? , two of the columns agree and one is different. Define a new array 
bij with bij = Oij — Oi^hi- Without loss of generality, assume h\ = h = 1 and = 2. The values 
bij are i.i.d. exponential, except bn = 621 = ^>32 = 0. The least 3-assignment of A differs from that 
of B by Oil -|- 021 + « 32 ) whose expected value is 3/n. 

Since column 1 has two zeros and row 3 has one zero, we can use Lemma 14 to insert a zero, and 
set 631 = 0. An optimal 3-assignment on B is then obtained from an optimal 2-assignment on the 
3 X (n — 1) array {hj\j = 2, 3, . . . , n}, augmented with a zero from the first column and the unique 
row not used in the 2-assignment. So 



EB* = 



G(2,3,n- 1) 
B(2,3,n- 1)- 



3(n — 1) 



1 



+ 



1 



3(n-l) 2(n-l) 

3n-4 



+ 



3(n-2) 3(n-l) 



6 (n — l)(n — 2 ) ’ 

and the optimal assignment on A has expected value 3/n larger: 



EA* = - + 



3n — 4 



6 (n — l)(n — 2 ) ’ 



With probability 1/n^ all three columns are the same. Let this column be column 1. Again we 
snbtract the row minima Oji from each row to obtain a 3 x n array bij = Oij —an. We want a minimal 
2-assignment from the 3 x (n — 1) array {pij \ j = 2,3, . . . , n}; we will augment that with the value 
bn = 0 from the first column and the nnique unused row. So, appealing to the previously proved 
case, the expected value of the 3-assignment of the b matrix is 

m3,n-l)= 1.11 7n-12 



+ 



3(n-l) 2(n-l) 3(n - 2) 6(n-l)(n-2)’ 



and that of the a matrix is 



3 7n - 12 
n 6 (n — l)(n — 2 ) 



Summing over the three cases, the expected value is 
(n- l)(n-2)^ /3\ f3{n-l)\ {3 . 3n - 4 



-1 + 
n 



3 

n 6 (n — l)(n — 2 ) 



+ 



3 In -12 
n 6 (n — l)(n — 2 ) 
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One can check algebraically that this is equal to 

1111 1 1 
^ 2{^ T(^ 3(n- 1) 2(n- 1) 3(n-2)' 

Case k = m = n = 4: 

Subtract the row minimum from each row, incurring an expected cost 4/4=1, and leaving an array 
bij of random variables, four of which are 0, and the rest of which are i.i.d. exponential. We will 
break into subcases, depending on the relative positions of the row minima. For each subcase we 
will develop three numbers: P, the probability of this case occurring (including symmetry); L, the 
expected cost of removing row minima and column minima; and R, the expected cost of the residual 
array b. 

Subcase 1: All row minima in the same column. Say 6 n = 621 = ^> 3 i = i> 4 i = 0. The 
probability of such an arrangement is P = (1/4)^ = 1/64. An optimal 4-assignment on b is obtained 
from an optimal 3-assignment on (6 without its first column) by adding a 0 entry from the first 
column. We have P = 1/64, L = 4/4, and (by the previously proved case) R = P(3, 3,4) = 65/72. 

Subcase 2: Three row minima in one column, the fourth in another. Say bn = 621 = 
&31 = 642 = 0. We calculate P: There are ( 3 ) = 4 ways to select the three rows whose columns 
agree. Each way has probability (l/4)^(3/4) because two specified rows agree with the first, and 
the last disagrees. So P = 4(l/4)^(3/4) = 12/64. Use Lemma 14 to insert a zero: 641 = 0. 
Find an optimal 3-assignment on b without its first column. We have P = 12/64, L = 4/4, and 
R = G(3, 3, 4) = 65/72 - 3/12 = 47/72. 

Subcase 3: Two row minima in one column, two in another. Say bn = 621 = 632 = ^42 = 0. 
The probability is 3(l/4)(3/4)(l/4) = 9/64. We will subtract column minima from the third and 
fourth columns; this incurs a cost of 1/4 for each column, giving L = 6/4. We call the resulting array 
c. We further break into subcases depending on the positions of the column minima. 

Subcase 3a: Two column minima on one row. Say C 13 = C 14 = 0. The net probability of this 
case is P = (l/4)(9/64) = 9/256. Use Lemma 14 to add zeros, successively: C 12 = 0, then C 22 = 0. 
The residual cost is that of a 2-assignment on the 3x3 array obtained by deleting the first row and 
second column from c; this array has one 0 element, so the cost is G{2, 3, 3) = 2/9. This subcase has 
P = 9/256, L = 6/4, and R = 2/9. 

Subcase 3b: Two column minima in first and second rows. Say C 13 = C 24 = 0. (The third 
and fourth rows could also be used.) The probability of this case is P = (l/4)(9/64) = 9/256. Use 
Lemma 14 to insert zeros, successively, at positions C 12 , C 22 , C 14 and C 23 . The residual cost is the 
minimum 1 -assignment (single element) from the 2x3 array gotten by deleting the first two rows 
and the second column from c. Thus we have P = 9/256, L = 6/4, and R = P(l,2,3) = 1/6. 

Subcase 3c: Two column minima in first and third rows. Say C 13 = C 34 = 0. This 
case includes one minimum in either the first or second row and the other in the third or fourth 
row. Its probability is P = (2/4)(9/64) = 18/256, and L = 6/4. Its residual cost is 0, since 
ci 3 = C 21 = C 34 = C 42 = 0 is an optimal assignment. So P = 18/256, L = 6/4, and P = 0. 

Subcase 4: Two row minima in one column, the others in two different columns. Say 
i’ll = i> 2 i = i >32 = i >43 = 0. The probability of such an arrangement is 6(l/4)(2/4)(3/4) = 36/64. 
Again we subtract the column minimum from the fourth column, and break into two subcases de- 
pending on its position. 

Subcase 4a: Column minimum in first (or second) row. Say C 14 = 0. This has probability 
P = (2/4)(36/64) = 18/64. The residual cost is 0 because C 14 = C 21 = C 32 = C 43 = 0. So P = 18/64, 
P = 5/4, P=0. 

Subcase 4b: Column minimum in third (or fourth) row. Say C 34 = 0. This has probability 
P = (2/4)(36/64) = 18/64. Use Lemma 14 to insert zeros, successively, at C 31 , C 41 , and C 33 . We 
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need a 2-assignment from the 3x3 array obtained by deleting the first column and third row from c; 
this has one 0 at C43 = 0, so its residual cost is G(2, 3, 3) = 2/9. So P — 18/64, L — 5/4, R — 2/9. 

Subcase 5: All row minima in distinct columns. For example, bn = 622 = f>33 = ^44 = 0. 
The optimal assignment is 0, and no further work is required. We have P = (3/4) (2/4) (1/4) = 6/64, 
L = 4/4, and i? = 0. 

The overall expected cost is computed as 

^(4,4,4) = ^Pi{Li + -^) = J + 4 + g + j0- 

This concludes the proof of Theorem 15. □ 

Theorem 16. With k = m = n , Conjecture 12 reduces to Parisi’s conjecture that F{n) = Y^=i ^ ■ 

Proof. With Sn = i+j<n (n-i)(n-j) ’ inductive proof centers on showing that Sn+i — Sn = 

l/(n+l)2. □ 



Acknowledgements 

Greg Sorkin is very grateful to Boris Pittel: for introducing the problem to me, for being constantly 

enthusiastic, and for failing to disclose the extent of previous work before I was completely hooked. 

Both authors thank David Williamson for helpful discussions. 

References 

[Ald92] David Aldous. Asymptotics in the random assignment problem. Pr. Th. Related Fields, 93:507—534, 1992. 

[CS98] Don Coppersmith and Gregory B. Sorkin. Constructive bounds and exact expectations for the random as- 
signment problem. Technical Report RC21133 (94490), IBM Reseach Report, Yorktown Heights, NY, March 
1998. Accessible at http://domino.Hatson.ibm.com/library/CyberDig.nsf/home. 

[DFM86] Martin E. Dyer, Alan M. Frieze, and Cohn J.H. McDiarmid. On hnear programs with random costs. Math- 
ematical Programming, 35:3-16, 1986. 

[Don69] W.E. Donath. Algorithm and average-value bounds for assignment problems. IBM Journal of Research and 
Development, 13(4):380-386, July 1969. 

[GK93] Michel X. Goemans and Murahdharan S. Kodialam. A lower bound on the expected cost of an optimal 
assignment. Mathematics of Operations Research, 18(2):267— 274, May 1993. 

[Kar84] R.M. Karp. An upper bound on the expected cost of an optimal assignment. Technical report. Computer 
Science Division, University of Cahfomia, Berkeley CA, 1984. 

[Kar87] Richard M. Karp. An upper bound on the expected cost of an optimal assignment. In David S. Johnson 
et al., editors. Discrete Algorithms and Complexity: Proceedings of the Japan-US Joint Seminar, volume 15 
of Perspectives in Computing, pages 1—4. Academic Press, 1987. 

[KKV94] Richard M. Karp, Alexander H.G. Rinnooy Kan, and Rakesh V. Vohra. Average case analysis of a heuristic 
for the assignment problem. Mathematics of Operations Research, 19(3):513-522, August 1994. 

[Laz79] Andrew J. Lazarus. The assignment problem with uniform (0,1) cost matrix. Master’s thesis. Department of 
Mathematics, Princeton University, Princeton, N.J., 1979. 

[Laz93] Andrew J. Lazarus. Certain expected values in the random assignment problem. Oper. Res. Lett., 14(4):207- 
214, 1993. 

[McD89] Cohn McDiarmid. On the method of bounded differences. In London Mathematical Society Lecture Note 
Series, volume 141, pages 148-188. Cambridge University Press, 1989. 

[MP85] M. Mezard and G. Parisi. Rephcas and optimization. J. Physique Lettres, 46:771-778, September 1985. 

[MP87] M. Mezard and G. Parisi. On the solution of the random hnk matching problems. J. Physique Lettres, 
48:1451-1459, September 1987. 

[01i92] Birgitta Ohn. Asymptotic Properties of Random Assignment Problems. PhD thesis, Kungl Tekniska 
Hdgskolan, Stockholm, Sweden, 1992. 

[Par98] Giorgio Parisi. A conjecture on random bipartite matching. Physics e-Print archive, http://xxx.lanl.gov/ps/- 
cond-mat/9801176, January 1998. 

[PW98] Boris Pittel and Robert S. Weishaar. The random bipartite nearest neighbor graphs. Submitted for pubhca- 
tion, 1998. 

[Wal79] David W. Walkup. On the expected value of a random assignment problem. SIAM J. Computing, 8(3):440- 
442, August 1979. 




The “Burnside Process” Converges Slowly* 



Leslie Ann Goldberg^ and Mark Jerrum^ 

^ Department of Computer Science, University of Warwick, Coventry, CV4 7AL, 

United Kingdom, 
leslieOdcs . Warwick . ac . uk, 
http; //www.dcs .Warwick, ac .uk/~leslie 
^ Department of Computer Science, University of Edinburgh, The King’s Buildings, 
Edinburgh EH9 3JZ, United Kingdom, 
mrj@dcs.ed.ac.uk, 
http : //www . dcs . ed . ac . uk/ ~mr j 



Abstract. We consider the problem of sampling “unlabelled structures” , 
i.e., sampling combinatorial structures modulo a group of symmetries. 
The main tool which has been used for this sampling problem is Burn- 
side’s lemma. In situations where a significant proportion of the struc- 
tures have no non-trivial symmetries, it is already fairly well understood 
how to apply this tool. More generally, it is possible to obtain nearly 
uniform samples by simulating a Markov chain that we call the Burnside 
process; this is a random walk on a bipartite graph which essentially im- 
plements Burnside’s lemma. For this approach to be feasible, the Markov 
chain ought to be “rapidly mixing”, i.e., converge rapidly to equilibrium. 
The Burnside process was known to be rapidly mixing for some special 
groups, and it has even been implemented in some computational group 
theory algorithms. In this paper, we show that the Burnside process 
is not rapidly mixing in general. In particular, we construct an infinite 
family of permutation groups for which we show that the mixing time is 
exponential in the degree of the group. 



1 Introduction 

The computational task considered in this article is that of sampling “unlabelled 
structures”, i.e., sampling combinatorial structures modulo a group of symme- 
tries. We work within the framework of Polya theory: “Structures” are taken to 
be length-m words over a finite alphabet S, and the group of symmetries is taken 
to be a permutation group G of degree m which acts on the words by permuting 
positions. (See Section 2 for precise definitions.) The image of a e T’’" under 
g is conventionally denoted a®. Words a and /3 are in the same orbit if there 
is a permutation g E G which maps a to a® = j3. The orbits partition the set 
of words into equivalence classes, and the computational problem is to sample 
words in such a way that each orbit is equally likely to be output.^ 

This work was supported in part by ESPRIT Projects RAND-II (Project 21726) and 
ALCOM-IT (Project 20244), and by EPSRC grant GR/L60982. 

^ Here is a concrete example: Let 27 be a binary alphabet. Encode the adjacency matrix 
of an n-vertex graph as a word of length m = (!(). The relevant permutation group 

M. Luby, J. Rohm, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 331-345, 1998. 
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The main tool which has been used for sampling orbits is Burnside ’s Lemma, ^ 
which says that each orbit comes up |G| times (as the first component) in the 
set of pairs 



r(i:, G) := {(a, g)\ae eG and = a}. (1) 

Thus, we are interested in the computational problem of sampling uniformly at 
random from T{E,G), given (an efficient representation of) G. 

Wormald [17] has shown how to solve this sampling problem for rigid struc- 
tures. That is, he has given an efficient random sampling algorithm that works 
whenever a high fraction of the pairs in T{B,G) have g equal to the identity per- 
mutation. Wormald’s method does not extend to the case in which the identity 
permutation contributes only a small fraction^ of the pairs in T(E, G). However, 
Jerrum proposed a natural approach based on Markov chain simulation which 
does extend to this case [8]. 

We give the details of the Markov chain simulation approach in Section 2. In 
brief, the idea is to consider the following bipartite graph: The vertices on the 
left-hand side are all words in The vertices on the right-hand side are all 
permutations in G. There is an edge from word a to permutation g if and only 
if a® = a. This graph essentially implements Burnside’s Lemma: The lemma 
shows that the stationary distribution of a random walk on the graph assigns 
equal weight to each orbit, i.e., to each unlabelled structure. The Markov chain 
that we consider, whicli we refer to as the “Burnside process”, is the random 
walk on this graph observed on alternate steps. 

We may obtain a nearly uniform unlabelled sample by simulating the Burn- 
side process from a fixed initial state for sufficiently many steps, and returning 
the final state. The efficiency of this sampling method is dependent on the so- 
called mixing time of the Burnside process: in rough terms, how many steps 
is “sufficiently many”? The aim of this article is to show that the mixing time 
of the Burnside process is sometimes very large. We now make that statement 
precise. 

For any two probability distributions tt and tt' on a finite set <F, define the 
total variation distance between tt and tt' to be 

Av(vr,7r') := m^\Tr{A) - tt' {A)\ = \tt{x) - tt' {x)\. 

Suppose M is an ergodic Markov chain with state space IL and stationary distri- 
bution TT, and let the t-step distribution of M, when started in state xq, be TTt- The 

is the group (acting on words) which is induced by the group of all permutations of 
the n vertices. Note that two graphs are in the same orbit if and only if they are 
isomorphic. 

^ Although this lemma is commonly referred to as “Burnside’s Lemma”, it is really 
due to Cauchy and Probenius [14]. 

^ Specifically, Wormald’s approach can be used when the fraction of pairs in T{E,G) 
which are due to the identity is at least the inverse of some polynomial in m. 
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mixing time of M, given initial state Xo, is a function Tx^ : (0, 1) N, from toler- 
ances 6 to simulation times, defined as follows: for each S E (0, 1), let Txq (^) be the 
smallest t such that j tt) < <5 for all t' > t. If the initial state is not signifi- 

cant or is unknown, it is appropriate to define r(5) = max^, Tx (<5) , where the max- 
imum is over all x E ll'- By rapid mixing, we mean that t{6) < poly(m, log<5“^), 
where m is the input size — in our case the degree of the group G — and 5 the 
tolerance. Stuart Anderson has suggested the phrase torpid mixing to describe 
the contrasting situation where mixing time is exponential in the input size. 

The Burnside process was shown to be rapidly mixing for some very special 
groups G [8]. However, it was an open question whether it is rapidly mixing in 
general. The precise result of this article (Theorem 1) is a construction of an 
infinite family of permutation groups G for which we show that the mixing time 
r(^) is exponential in the degree of G. Thus, if we use the f-step distribution to 
estimate the probability 7 t(A) of some event A C ^in the stationary distribution, 
the result may be out by as much as |, unless we take t exponentially large. 

The main idea of the proof is to relate the mixing time of the Burnside 
process to the “Swendsen-Wang process”, a particular dynamics for the Potts 
model in statistical physics. The Swendsen-Wang process was shown by Gore 
and Jerrum [6] to have exponential mixing time at a certain critical value of 
a parameter called “temperature”. It turns out that the Swendsen-Wang pro- 
cess defined on a graph , at a different (lower, non-critical) temperature has 
exactly the same dynamics as the Burnside process on a derived permutation 
group G 3 (, ). Thus we only have to relate the Swendsen-Wang process at the two 
different temperatures, which we do using the “Z-stretch” construction used by 
other authors [7] . The dynamics of the Swendsen-Wang process is not perfectly 
preserved by the /-stretch construction, but the correspondence is close enough 
to yield the claimed result. 

Sections 2 and 3 describe the Burnside and Swendsen-Wang processes; Sec- 
tion 4 describes the relationship between the two; Section 5 relates the Swendsen- 
Wang process at two different temperatures via the /-stretch construction, thus 
completing the “torpid mixing” proof; finally, Section 6 concludes with some 
open problems. 

2 The Burnside process 

Let S = {0, . . . , fc — 1} be a finite alphabet of cardinality k, and G a permutation 
group on [m] = {0, . . . , m — 1}. For g E G and i E [m], denote by the image 
of i under g. The group G has a natural action on the set I/™ of all words 
of length m over the alphabet E, induced by permutations of the “positions” 
0, . . . , m — I. Under this induced action, the permutation g E G maps the word 
a = Uotti . . . ttm-i to the word = /3 = bobi . . . bm-i defined by bj = o* for 
all i,jE [m] satisfying = j. The action of G partitions T”" into a number of 
orbits, these being the equivalence classes of T’’” under the equivalence relation 
that identifies a and /3 whenever there exists g E G mapping a to /3. The orbit 
{a® : g E G} containing the word a E T”” is denoted a®. As we indicated in 
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the introduction, Burnside’s Lemma says that each orbit comes up |G| times in 
the set T{E, G) defined in equation (1). Thus, we are interested in the problem 
of uniformly sampling elements of T(S,G). 

A standard attack on combinatorial sampling problems [10] is to design a 
Markov chain whose states are the structures of interest (in this case the state 
space is G) and whose transition probabilities are chosen so that the stationary 
distribution is the required sampling distribution. The following natural Markov 
chain was proposed by Jerrum [8]. As we noted in the introduction, it is essen- 
tially a random walk on the bipartite graph which corresponds to Burnside’s 
Lemma. The state space of the Markov chain Mb = Mb{G,S) is just G. The 
transition probabilities from a state g E G are specified by the following concep- 
tually simple two-step experiment: 

(Bl) Sample a uniformly at random (u.a.r.) from the set Fix g := {a E A'™ : a® = a 
(B2) Sample h u.a.r. from the point stabiliser Ga := {h E G : a'^ = a}. 

The new state is h. Algorithmically, it is not difficult to implement (Bl). How- 
ever, Step (B2) is apparently difficult in general. (It is equivalent under ran- 
domised polynomial-time reductions to the Setwise Stabiliser problem, which 
includes Graph Isomorphism as a special case.) Nevertheless, there are signifi- 
cant classes of groups G for which an efficient (polynomial time) implementation 
exists. Luks has shown that p-groups — groups in which every element has order 
a power of p for some prime p — is an example of such a class [11]. 

Returning to the Markov chain itself, we note immediately that Mb is er- 
godic, since every state (permutation) can be reached from every other in a single 
transition, by selecting the word a = O’” in step (Bl). Let tt : G — >■ [0, 1] denote 
the stationary distribution of Mb- Then 7r{g) is proportional to the degree of 
vertex g in the bipartite graph corresponding to Burnside’s Lemma, which is 
[Fixc/l = k'^G) ^ where c{g) denotes the number of cycles in the permutation g. 
We have therefore established the following Lemma from [9]: 

Lemma 1. Let tt be the stationary distribution of the Markov chain Mb{G, A). 
Then Trig) = 3"(A,G)] for all gEG. 

Although the Markov chain Mb on G is the most convenient one for us to 
work with, it is clear that we can invert the order of steps (Bl) and (B2) to 
obtain a dual Markov chain A/g(G, A) with state space A’”. The dual Markov 
chain^ has greater practical appeal, as it gives a uniform sampler for orbits (i.e., 
unlabelled structures): 

Lemma 2. Let tt' be the stationary distribution of the Markov chain M^{G, A). 
Then 

,'ia) = ^ 

]aG|]r(A,G)] 

for all a E A’”; in particular, tt' assigns equal probability to each orbit a®. 

^ In references [8] and [9] , the primed and nnprimed versions are reversed. 
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The result again follows from a consideration of the random walk on the bipartite 
graph, using the elementary group-theoretic fact that |Ga| x \a‘^\ = |G|. 

Peter Cameron has observed that a Markov chain similar to Mg may be 
defined for any group action, not just the special case of a permutation group G 
acting on T'”* by permutation of positions. In the general setting: given a point 
a, select u.a.r. a group element g that fixes a, and then select a point that is 
fixed by g. Thus, the generalisation of Mg to arbitrary group actions provides a 
potentially efficient procedure for uniformly sampling unlabelled structures (i.e., 
sampling structures up to symmetry). This procedure has been implemented in 
certain algorithms for determining the conjugacy classes of a finite group [16]. 

Of course, the effectiveness of A/g (equivalently A/g) as a basis for a general 
purpose sampling procedure for unlabelled structures depends on its mixing 
time. It was known that Mg mixes rapidly in some special cases (see Jerrum [8]), 
but it was not previously known whether Mb mixes rapidly for all groups G. 
Specifically, it was not known whether the mixing time of Mg(G, S) is uniformly 
bounded by a polynomial in m, the degree of G. The result in this article is a 
construction of an infinite family of permutation groups for which we show that 
the mixing time of Mg grows exponentially in the degree m. 



3 The Swendsen-Wang process 

As noted in the Introduction, our strategy is to relate the mixing time of the 
Burnside process to that of the Swendsen-Wang process. In this section we de- 
scribe the latter process, which provides a particular dynamics for the g-state 
Potts model. In fact, we need only consider the special case g = 3. See Martin’s 
book [13] for background on the Potts model. 

A (3-state) Potts system is defined by a graph , = (V, E) and a real number 
(“inverse temperature”) (3. For compactness, we will sometimes denote an edge 
(i,j) e Ehy ij. A configuration of the system is an assignment a : V ^ {0, 1,2} 
of “spins” or colours to the vertices of , . The set of all 3l'^l possible configura- 
tions is denoted by O. We associate each configuration cr 6 12 with an energy 
•= ~ <^(j))] » where S is the Kronecker-<5 function which is 1 

if its arguments are equal, and 0 otherwise. Thus the energy of a configuration is 
just the number of edges connecting unlike colours. The (Boltzmann) weight of 
a configuration cr is exp{—l3H{cr)). The partition function of the 3-state Potts 
model is 

Z = Z(, ,/3) := ^exp(-/3M(a)); (2) 

it is the normalising factor in the Gibbs distribution on configurations, which as- 
signs probability exp{— H (a)) / Z to configuration a. To avoid the exponentials, 
we will define the edge weight A of the Potts system to be e~^ , so the partition 
function (2) may be rewritten as 

2 = 2(, ,A) = E n ^[l-c5((T(i),<T(j))]_ 

<reO ijeE 



(3) 
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Thus the weight of a configuration is A^, where h is the number of bichromatic 
edges. 

The Swendsen-Wang process specifies a Markov chain Msw{, , A) on Q. Let 
the current Potts configuration be denoted by a. The next configuration a' is 
obtained as follows. 

(SWl) Let A = {ij e E : = a (j)} he the set of monochromatic edges. Select 

a subset AQ Ahy retaining each edge in A independently with probability 
p = 1 — A. 

(SW2) The graph (V, T) consists of a number of connected components. For each 
connected component, a colour is chosen u.a.r. from {0, 1, 2}, and all vertices 
within the component are assigned that colour. 

That the Markov chain with transitions defined by this experiment is ergodic is 
immediate; that it has the correct (i.e., Gibbs) distribution is not too difficult to 
show. (See, for example, Edwards and Sokal [4].) Both the Swendsen-Wang pro- 
cess (Msw() 5 A)) and the Burnside process (Mb(G, T')) are examples of Markov 
chains using the “method of auxiliary variables” (see [4] and [1]). 

4 The relationship between the Burnside process and the 
Swendsen-Wang process 

Let T be a finite alphabet of size k, and let , = {V, E) be an undirected graph 
defining a 3-state Potts system with edge weight A = k~^. We will construct 
an associated permutation group Gs(, ) such that the dynamics of the Burnside 
process on (Gs(, ),T') is essentially the same as the Swendsen-Wang dynamics 
on (, , A) . This construction generalises a construction from [8] , which deals with 
the case k = 2 (i.e., the binary alphabet case). 

The permutation group Gs(, ) acts on the set A = which is the 

disjoint union of three-element sets Ag. Arbitrarily orient the edges of , , so that 
each edge e e E has a defined start-vertex e~ and end- vertex e+. For e e E and 
V G V, denote by hg some fixed permutation that induces a 3-cycle on Ag and 
leaves everything else fixed, and denote by the generator 

Qv-- n ^ n 

e:e+=^; e:e~=v 

Finally, define Gs(, ) = {Qv '■ v e V), the group generated by {gv}- 

Observe that the generators of G3 (, ) commute and have order three, so each 
permutation (7 G G3 (, ) can be expressed as 

g = g(a) := J] = J] (4) 

veV e€E 

where cr : V ^ {0, 1,2}. Provided the graph , is connected, this expression is 
essentially canonical, in that a is uniquely determined up to addition (mod 3) of 
a constant function. To see this, note that g uniquely determines the exponent 
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of he in expansion (4), which in turn determines the difference between the 
colours (viewed as integers) at the endpoints of edge e. Note that all three of 
the configurations associated with g induce the same set A in (SWl). Thus, the 
transition probabilities from the three configurations are the same, and we can 
therefore think of g as being associated with all three configurations. 

Lemma 3. Suppose , is a graph, S a finite alphabet, and let k = |T'|. Then 

Mb(Gs(, ),E) = Msw{, ,k~^); 

that is to say, each permutation g in the state space of Mb{G 3 {, ),T!) can be 
associated with exactly three configurations in the state space of Msw{, , k~“^) in 
such a way that transition probabilities are preserved. 

Proof. We associate each permutation g E. Gs{, ) with three configurations as 
described above. As we observed, the transition probabilities of the three con- 
figurations in SW are identical. 

Perhaps the easiest way to show that these transition probabilities are the 
same as those in Mb is to combine the experiment defining the Burnside process 
(see (Bl) and (B2)) with that defining the Swendsen-Wang process (see (SWl) 
and (SW2)) into a single coupled version. Start with the pair (g,ag), where ag 
is one of the three configurations associated with g. 

(Cl) Sample a u.a.r. from the set Fixg = {a € 27”* : a® = a} of words fixed 
by g. Let A := {e E E : a is not constant on Ag}. The pair {a, A) is the 
intermediate state. 

(C2) Sample h u.a.r. from the point stabiliser G„ = {h E G : = a}. 

The new pair is {h, Oh) (again, choose Uh arbitrarily from the three configurations 
associated with h.). 

By construction, the transitions g a ^ h occur with the probabilities 
dictated by (Bl) and (B2). We must check that the induced transitions ag 
A ^ ah match (SWl) and (SW2) in probability. Let e = uv E E he any 
edge, and consider the action of g on Ag. If ag{u) = ag{v) then the action of 
g on Ag is the identity, and probability that a is constant on Ag is k~^. Thus 
the probability that e G A is 1 — k~^, independent of the other edge choices, as 
required by (SWl), where A = k~‘^. Otherwise, ag{u) ^ ag(v) and the action 
of g on Ag is a 3-cycle. Necessarily, a is constant on Ag, and e ^ A, again as 
required by (SWl). So the distribution of A C £ is correct. 

To verify the second step, again \et e = uv E E be any edge. If e G A then a 
is not constant on Ag, entailing that the action of h on Ag is the identity and 
(Th{u) — ah{v). Conversely, if e ^ A then a is constant on Ag, and ah{u) — ah{v) 
is unconstrained. Thus /i i-T- ct/, is a bijection from Gq- to configurations that are 
constant on connected components of {V,A), and the distribution of ah is as 
demanded by (SW2). 
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5 Torpid mixing 



We have seen that the Burnside process is equivalent to the Swendsen-Wang 
process at a particular edge-weight A; and it is known that the Swendsen-Wang 
process at a different edge weight (which is approximately 1 — (4 In 2)/|V"|, where 
V is the vertex set of , ) has exponential mixing time [6] . In this section we bridge 
the gap between the different edge weights. 

Denote by Pi the path of length I or l-path, i.e., the graph with vertex set 
[/ + 1] and edge set {{f, i -|- 1} : 0 < i < Z}. 

Lemma 4. Consider a randomly sampled configuration of the 3-state Potts model 
on Pi with edge weight A. The induced distribution of colours on the two end ver- 
tices of Pi is identical to the distribution of configurations of the 3-state Potts 
model on Pi (= K 2 ) with edge weight 



\(l) := 



(1 + 2A)' - (1 - \y 

(1 + 2A)'+2(1- A)'' 



(5) 



Proof. Define w^^^ e to be the vector whose first (respectively, second) com- 
ponent Wq^ (respectively, te®) is the total weight of those configurations on Pi 
whose (ordered) endpoints have colours (0,0) (respectively, (0, 1)). Clearly, there 
is nothing special in the particular choice of colours; the pair (0, 0) could be re- 
placed by any pair of like colours, and (0, 1) by any pair of unlike ones. Introduce 
the matrix 



T := 



1 2A 
A 1-hA 



a straightforward induction on I establishes 



= T' 




The matrix T has eigenvalues 1 — A and H-2A. Introduce two further matrices 



D := 



1-A 0 

0 H-2A 



and S := 



2 1 

-1 1 



Then T = SDS ^ and hence T' = SD^S Noting that 



5-1 = x 



1 -1 

1 2 



we obtain 



= SD^S-^ 



_ 1 /(1 + 2A)' + 2(1-A)'A 
3 + 2A)' - (1 - A)' j • 



( 6 ) 



Since Pi is equivalent — in the sense of the statement of the lemma — to a single 
edge with effective weight tu® / w® , Lemma 4 follows immediately. 
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Denote by Kn ® Pi the graph obtained from the complete graph on n vertices 
by subdividing each edge by Z — 1 intermediate vertices of degree two. Thus 
each edge of becomes in Kn 0 Pi a copy of the 1-path Pi. We refer to the 
vertices of degree n — 1 as exterior vertices and those of degree two as interior. 
(Assume n > 3 to avoid trivialities.) We remark that this construction is just 
the “/-stretch” , used in related situations by Jaeger, Vertigan and Welsh [7]. The 
/-stretch operation allows us to move between different edge weights, at least if 
we forget for a moment the specific dynamics imposed by the Swendsen-Wang 
process. 

Lemma 5. Consider a randomly sampled configuration of the 3-state Potts model 
on Kn ® Pi with edge weight A. The induced distribution of colours on the exte- 
rior vertices of Kn ® Pi is identical to the distribution of configurations of the 
3-state Potts model on Kn with edge weight X, where A = A(/) is as in (5). 



Proof. Suppose a is any Potts configuration on the graph Kn ® Pi , and S is any 
subset of its vertices. Denote by (t |5 6 {0, 1, the restriction of a to the set S. 
Through some elementary algebraic manipulation, we may express the partition 
function of a Potts system on Kn ® Pi in terms of the partition function of a 
Potts system on Kn with edge weight closer to 1. In the following manipulation, 
we assume that the vertices of Kn ® Pi are numbered 0,. . . ,N — 1 and that 
the exterior vertices receive numbers in the range 0, ...,n — 1. Furthermore, 
Uij C [N] denotes the set of / — 1 interior vertices lying on the /-path between 
exterior vertices i and j, and Eij denotes the set of edges on that path. 



Z{Kn(3)Pl,X) 

_ ^ j^[l-d(<T(«),cr(tj))] 

IT UV&E 




^[1-i5((t(«),(t(«))] 

^[l-l5(<T(«),<T(tj))] 



<t|[„] 0<i<i<n-l 

= C"("-i)/2z(ii:„,A), 



where C is a constant (actually vJq^). The penultimate equality above uses 
Lemma 4. 

Let (T e {0, 1, 2}” be any configuration on From the above manipulation, 
we see that the weight of the configuration a on is equal — modulo the con- 
stant factor — to the sum of the weights of configurations a of Kn®Pi 

that agree with d on the exterior vertices or, symbolically, = a. This proves 
Lemma 5. 
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Lemma 6. There exists an infinite sequence of pairs (n,l) = {(n(Z),Z) : I = 
1, 2, . . .} such that 



(1 - A(/)) 



41n2 3 

n{l) ~ n(l)^ 



for all pairs, where \{l) is defined as in (5). 



Proof. The function 1 — A(Z) decreases monotonically to 0, as / ^ oo. Given I, 
choose n to be the unique natural number satisfying 



41n2 
n(Z) + 1 



< 1 - A(Z) < 



41n2 
n(Z) ■ 



The upper and lower bounds differ by less than 3n{l) Thus, we have proved 
Lemma 6. 



Let Q be the set of configurations of the 3-state Potts model on ® P;. 
For each configuration a E O, define 7 (ct) 6 be the 3- vector whose Zth 
component is the proportion of exterior vertices of ® Pi given colour i by a. 
Then let f2i:i:i(e) (respectively, f24:i:i(e)) denote the set of configurations a snch 
that 7 ((t) lies within an e-ball centred at (respectively, one of the three 

e-balls centred at (|, |), (|, |), or (|, §)). 

Lemma 7. Let a configuration a be sampled from the 3-state Potts model on 
Kn ® Pi with edge weight X, and suppose that 1 — A(Z) = (41n2)/n -I- 
Then, for any e > 0: 

(i) Pr((T G Pl:l:l(e)) = I7(n“2); 

(a) Pr((T G f24:i:i(e)) = J7(n“^); and 
(Hi) Pr((T ^ f2l:l:l(e) U f24:l:l(e)) = 

The implicit constants depend only on e. 

Proof. By Lemma 4, we may equivalently work with the Potts model on Kn 
with edge weight A(Z). 

When 1 — A(/) = (4 In 2)/n, i.e., the error term is 0, this is precisely the result 
of Gore and Jerrum [6, Prop 3]. See also Bollobas, Grimmett and Janson [2]. 
The validity of the proof given in [6] is unaffected by the error term: an additive 
error 0(n~^) in X(l) translates to an additive perturbation 0(n~^) in the func- 
tion / in [6, eq. (2)]. This perturbation may be absorbed into the error term A 
appearing in that equation, which is 17(1). Thus, we have proved Lemma 7. 

We now need to compare the dynamics of the Swendsen-Wang processes 
on Kn ® Pi and Kn, more precisely, the Markov chains Mswi^n ® Pi, A) and 
Msw{Kn, X). The correspondence will not be exact, as in Lemma 3, but it will 
be close enough for our purposes. 

Let Qi,^p denote the standard random graph model in which an nndirected 
z/- vertex graph is formed by adding, independently with probability p, for each 
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unordered pair of vertices an edge connecting i and j. Suppose that p < 

d/v, with d < 1 a constant, and , is selected according to the model It is 
a classical result that, with probability tending to 1 as r/ ^ oo, the connected 
components of , all have size 0(log v) . We require a (fairly crude) large deviation 
version of this result. 

Lemma 8. Let , he selected according to the model Qv,p, where p < djv and 
0 < d < 1 is a constant. Then the probability that , contains a component of 
size exceeding is exp(— I7(v^)). 

Proof. This result in exactly this form appears as [6, Lemma 4]. See O’Connell [15, 
Thm 3.1] for a much more precise large-deviation result for the “giant compo- 
nent” of a sparse random graph. 

We also need: 

Lemma 9 (HoefFding). Let Zi, ... ,Zg be independent r.v’s with Oi < Zi <bi, 
for suitable constants a*, bi, and all 1 < i < s. Also let Z = Z^. Then for 

any t > 0, 



Pr (jZ — ExpZ| >t)< exp^— j 

i=l 



Proof. See McDiarmid [12, Thm 5.7]. 

Lemma 10. Let a configuration a £ Q be sampled from the 3-state Potts model 
on Kn®Pi with edge weight X, and suppose that 1 — X{1) — (41n2)/n-|-0(n“^). 
Let a' E. O be the result of applying one step of the Swendsen-Wang process, 
starting at a. Then, for any e > 0, 

Pr(cr' 6 A:i:i(e) I cr e 12i:i:i(e)) = 1 

and 

Pr(cr' e 124:l:l(e) I O' e 124:l:l(e)) = 1 — 



The implicit constants depend only on e. 

Proof. For i,j exterior vertices of <8) Pi satisfying a{i) = (r{j). 



Pr(Path « -o- j is monochromatic) = 



w, 



(0 



(1 + 2A)' + 2(1-A)'’ 



where the second equality is from (6). After step (SWl), 

Pr(Path i j is contained in A) 

= Pr(Path i j is monochromatic) x (1 — A)^ 
3(1 -A)' 

(1 + 2A)' + 2(1 - A)* 



= 1-A(/). 
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For convenience, set p = 1 — X(l)- Consider the set of exterior vertices of 
some given colour, and let < (| + e)n be the size of that set. Provided e 
is small enough (e = 1/40 will do), pv < d < 1. By Lemma 8, with probability 
1 — the maximum number of exterior vertices in any connected 

component of the graph ([iV],^) restricted to this colour-class is at most 
(Recall that [A''] is the vertex set of Kn®Pi.) Combining this observation for all 
three colours, and noting v = 0(n), we obtain the following: with probability 
1 — exp(— l7(^/n)), the number of external vertices in any connected component 
of ([A''],^) is at most s/n. 

Let s be the number of such components, and rii, . . . , n* be their respective 
sizes. The expected size of a colour-class constructed in step (SW2) is n/3, and 
because there are many components (at least ) we expect the actual size of 
each colour-class to be close to the expectation. We quantify this intuition by 
appealing to the Hoeffding bound. Fix a colour, say 0, and define the random 
variables Fi , . . . , Yg and Y by 

y. _ / ^*5 if the *th component receives colour 0 in step (SW2); 

* \ 0, otherwise, 

and Y = i/. Then ExpF = n/3 and, by Lemma 9, for any t > 0, 

S 

Pr (|F - ExpF| >t) < exp(|-2f^ / 

i=l 



since 

. 

i=l i=l 

Similar bounds apply, of course, to the other colours. Choosing t = en/^ we 
see that, with probability 1 — exp(— l7(\/n)), the size of every colour class in 
a' lies in the range ((| — e/\/3)n, (| -I- e/\/3)n); but this condition implies 
0-' G f2l:l:l(e). 

This proves the first part of Lemma 10, concerning A:i:i(e); the second 
part of the lemma follows from the first by Lemma 7 and time-reversibility. 
In particular, it follows from the fact that Msw satisfies the detailed balance 
condition: 

Pr(cT = cTi A ct' = a-z) = Pr(cr = az Aa' = a\), 

for all configurations cti and <72, where a is sampled from the stationary distri- 
bution. 

It is now a short step to the main theorem. Recall that r(|) denotes the number 
of steps t before the t-step distribution is within variation distance ^ of the 
stationary distribution (maximised over the choice of starting state). 




The ‘Burnside Process’ Converges Slowly 343 



Theorem 1. Let S be a finite alphabet of size at least two. There exists an 
infinite family of permutation groups G such that the mixing time of the Burnside 
process M^{G,E) is exponential in the degree m of G; specifically r(l/3) = 
for any e > 0. 

Proof. By Lemma 3, it is enough to exhibit an infinite family of graphs , such 
that Msw(; )A) has exponential mixing time, where A = k~‘^. This family 
of graphs will of course be {Kn(i) ® P; : / G N) where n{l) is as defined in 
lemma 6. The family of permutation groups promised by the theorem will then 
be {G3{Kn(i) ® Pi) : I G N). 

Consider a trajectory (at : t £ N) of Msw{Kn ^ Pi, X) starting in the sta- 
tionary distribution. We say that the trajectory escapes at step t if 

{at G f?l:l:l(e) A Gt+l ^ A:l:l(e)) V {at G f 24 :l:l(e) A at+l ^ ■C4:l:l(e))- 

For each t, by Lemma 10, the probability of escape at time t is bounded by 
exp(— J7(y^)). Furthermore, by Lemma 7 the probability of the event 

(To ^ f?l:l:l(e) U f?4:l:l(e) 
is also bounded by exp(— J7(y^)). 

Thus there is a function T = T{n) = exp(J7(y^)) such that, with probability 
at least the initial segment of the trajectory {at : 0 < t < T) lies either 
entirely within 12i:i:i(e) or entirely within 124:i:i(e). Hence there is an initial 
state s G 12i:i:i(e) such that Pr(crT ^ ■Ci:i,i(e) | (Tq = s) < and similarly 
for s G 124:1:1 (e). Choose such an initial state s from whichever of 12i:i:i(e) or 
124:1:1 (e) has the smaller total weight in the stationary distribution. Then the 
variation distance of the T-step distribntion from the stationary distribntion is 
at least ^ Finally note that m = 0{nfl) — 0{n^ logn). (It is 

straightforward to see from Lemma 6 that I = O(logn).) Thus, we have proved 
Theorem 1. 

Although the definition of r contains an existential quantification over initial 
states, it will be seen that Theorem 1 is not very sensitive to the initial state: 
r(i) can be replaced by Ts(I), where s ranges over almost every state in 12i:i:i(e) 
or 124:i:i(e), as appropriate (“almost every” being interpreted with respect to the 
stationary distribution). 

6 Open problems 

In this paper, we have shown that the Burnside process is not rapidly mixing in 
general. It remains an open question whether there is some other polynomial- 
time method which achieves the same distribution as the Burnside process, either 
on permutations (as in Lemma 1) or on words (as in Lemma 2). Since (Bl) is 
easy to implement in polynomial-time, a polynomial-time sampling algorithm for 
the stationary distribution tt of Lemma 1 would yield a polynomial-time sampler 
for the stationary distribntion tt' of Lemma 2 (i.e., the uniform distribution on 
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orbits). If there is a polynomial- time sampling algorithm for the distribution vr 
this will imply [9] that there is a fully polynomial randomised approximation 
scheme for the single- variable cycle index polynomial for every integer k (see [3]). 
Such a result would be a striking contrast to the result of the authors (see [5]) 
which shows that, unless NP = RP, no such approximation algorithm exists for 
any fixed rational non-integer k. 
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Abstract. We consider the standard Quicksort algorithm that sorts n 
distinct keys with all possible n\ orderings of keys being equally likely. 
Equivalently, we analyze the total path length Cn in a randomly built 
binary search tree. Obtaining the limiting distribution of Cn is still an 
outstanding open problem. In this paper, we establish an integral equa- 
tion for the probability density of the number of comparisons Then, 
we investigate the large deviations of Cn- We shall show that the left tail 
of the limiting distribution is much “thinner” (i.e., double exponential) 
than the right tail (which is only exponential). We use formal asymptotic 
methods of applied mathematics such as the WKB method and matched 
asymptotics. 



1 Introduction 

Hoare’s Quicksort algorithm [9] is the most popular sorting algorithm due to its 
good performance in practise. The basic algorithm can be briefly described as 
follows [9, 12, 14]: 

A pivotal key is selected at random from the nnsorted list of keys, and 
used to partition the keys into two sublists to which the same algorithm 
is called recursively until the sublists have size one or zero. 

To justify the algorithm’s good performance in practise, a body of theory 
was built. First of all, every undergraduate learns in a data structnres course 
that the algorithm sorts “on average” n keys in 0(nlogu) steps. To be more 
precise, one assnmes that all n\ possible orderings of keys are equally likely. It 
is, however, also known that in the worst case the algorithm needs O(n^) steps 
(e.g., think of an input that is given in a decreasing order when the output is 
printed in an increasing order). Thus, one needs a more detailed probabilistic 
analysis to understand better the Quicksort behavior. In particular, one wants 
to know how likely (or rather unlikely) it is for such pathological behavior to 
occur. Our goal is to answer precisely this question. 

A large body of literatnre is devoted to analyzing the Quicksort algorithm 
[3, 4, 5, 7, 12, 13, 14, 15, 16, 18]. However, many aspects of this problem are 

* The work was supported by NSF Grant DMS-93-00136 and DOE Grant DE-FG02- 
93ER25168, as well as by NSF Grants NCR-9415491, NCR-9804760. 

M. Luby, J. Rolim, and M. Sema (Eds.): RANDOM’98, LNCS 1518, pp. 346-356, 1998. 
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still largely unsolved. To review what is known and what is still unsolved, we 
introduce some notation. Let £„ denote the number of comparisons needed to 
sort a random list of length n. It is known that after selecting randomly a key, 
the two sublists are still “random” (cf. [12]). Clearly, the sorting time depends 
only on the keys’ ranking, so we assume that the input consists of the first 
n integers {1, 2, ...,n}, and key k is chosen with probability 1/n. Then, the 
following recurrence holds 



^ It T kLn—l—k • 



Now, let Ln{u) = be the probability generating 

function of The above recurrence implies that 



Ln(,u) = ^ 

i=0 



( 1 ) 



with Lq{u) — 1. Observe that the same recurrences are obtained when analyzing 
the total path length £„ of a binary search tree built over a random set of n 
keys (cf. [12, 14]). Finally, let us define a bivariate generating function L{z,u) = 
'^^^gLn{u)z^. Then (1) leads to the following partial-differential functional 
equation 



dL(z,u) 

dz 



lP‘{zu,u) , 



dL{Q, u) 

¥z 



= 1 . 



( 2 ) 



Observe also that L{z, 1) = (1 — z)~^. 

The moments of £„ are relatively easy to compute since they are related 
to derivatives of Ln(u) at u = 1. Hennequin [7] analyzed these carefully and 
computed exactly the first five cumulants, and obtained an asymptotic formula 
for all cumulants as n — ^ oo. 

The main open problem is to find the limiting distribution of £„. Regnier 
[15] proved that the limiting distribution of (£„ — E[£„])/n exists, while Rosier 
[16, 17] characterized this limiting distribution as a fixed point of a contraction 
satisfying a recurrence equation. A partial-differential functional equation seem- 
ingly similar to (2) was studied recently by Jacquet and Szpankowski [10]. They 
analyzed a digital search tree for which the bivariate generating function L(z,u) 
(in the so-called symmetric case) satisfies 



dL{z, u) 
dz 



LF{-zu,u) 



with £(^;,0) = 1. The above equation was solved asymptotically in [10], and 
this led to a limiting normal distribution of the path length £„ in digital search 
trees. While the above equation and (2) look similar, there are crucial differences. 
Among them, the most important is the contracting factor | in the right-hand 
side of the above. Needless to say, we know that (2) does not lead to a normal 
distribution since the third moment of (£„ — E[£„])/n does not tend to zero (cf. 

[ 14 ]). 
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In view of the above discussion, a less ambitious goal was set, namely that of 
computing the large deviations of i.e., Pr[|£„ — E[£„]| > eE[£„]] for e > 0. 
Hennequin [7] used Chebyshev’s inequality to show that the above probability 
is 0(l/(elog^n)). Recently, Rosier [16] showed that this probability is in fact 
0{n~^) for any fixed k, and soon after McDiarmid and Hayward [13] used the 
powerful method of bounded differences, known also as Azurna’s inequality, to 
obtain an even better estimate, namely that the tail is (^gee the 

comment after Theorem 1 of Section 2). 

In this paper, we improve on some of the previous results. First of all, we 
establish an integral equation for the probability density of and using this we 

derive a left tail and a right tail of the large deviations of We demonstrate 

that the left tail is much thinner (i.e., double exponential) than the right tail, 
which is roughly exponential. We establish these results using formal asymp- 
totic methods of applied mathematics such as the WKB method and matched 
asymptotics (cf. [2, 6]). 

This paper is a conference version of the full paper [11] which contains all 
proofs while here we rather sketch some of the derivations. The full version can 
be found on http://www.cs.purdue.edu/homes/spa/current.htinl. 

2 Formulation and Summary of Results 

As before, we let £„ be the number of key comparisons made when Quicksort 
sorts n keys. The probability generating function of £« becomes 

OO 

Ln(u) = (3) 

ft =0 

The upper limit in this sum may be truncated at fc = (”), since this is clearly 
an upper bound on the number of comparisons needed to sort n keys. 

The generating function Ln{u) satisfies (1) which we repeat below (cf. also 
[5, 14, 15, 16]) 



Xh' ^ X 

— 1. (4) 

” + ^ j=0 

Note that £«(!) = 1 for all n > 0, and that the probability Pr[£„ = k] may be 
recovered from the Cauchy integral 

Pr[£„ = k] = f Ln{u)du. (5) 

2'Tr* Jc 

Here C is any closed loop about the origin. 

In Section 3, we analyze (4) asymptotically for n ^ oo and for varions ranges 
of u. The most important scale is where n ^ oo with u — 1 = 0{n~^), which 
corresponds to fc = E[£„] -|- 0{n) = 2nlogn -|- 0(n). Most of the probability 
mass is concentrated in this range of k. As mentioned before, the existence of a 
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limiting distribution of (£„ — E[£„])/n as n ^ oo was established in [15, 16], 
though there seems to be little known about this distribution (cf. [3, 5, 7, 13, 18]). 
Numerical and simulation results in [3, 5] show that the distribution is highly 
asymmetric and that the right tail seems much thicker than the left tail. It is 
also of interest to estimate these tails (cf. [13, 16]), as they give the probabilities 
that the number of key comparisons will deviate signihcantly from E[£„], which 
is well known to be asymptotically equal to 2nlogn as n — >■ oo (cf. [7, 14]). 

For u — 1 = w/n = 0{n~^) and n — >■ oo, we shall show that 



Ln{u) - exp (Anw/n) 



Go{w) -\ — ^ — Gi(rc)H — G 2 {w) o{n ^) 
n n 



( 6 ) 



where An = E[£„]. The leading term Go{w) satisfies a non-linear integral equa- 
tion. Indeed, we find that (cf. (39)) 

e-^Go(w)=[ e‘^'f’(^'>^Goiwx)Goiw - wx)dx (7) 

Jo 

Go(0) = l; G'(0)=0 (8) 

where 

())(a;) = arlogx -I- (1 — a;) log(l — a;) (9) 

is the entropy of the Bernoulli(a;) distribution. Furthermore, the correction terms 
Gi(-) and G 2 (-) satisfy linear integral equations (cf. (40)-(41)). 

By using (6) in (5) and asymptotically approximating the Cauchy integral 



we obtain 


Pr[Cn - E(£„) = ny] ~ ^Piv) 


(10) 


where 


1 ^c+ioo 

P{y) = / e-y^G^{w)dw, 

Z7T2 J c—ioo 


(11) 




/'OO 

Goiw) = / ey^P(y)dy 

J —OO 


(12) 



and c is a constant. Hence, Go(rc) is the moment generating function of the 
density P{y)- 

Now, we can summarize our main findings: 



Theorem 1. Consider the Quicksort algorithm that sorts n randomly selected 
keys. 

(i) The limiting density P{y) satisfies 



P(y + 1) 





y - 20(ar) \ 
2(1 -a:) ) 



P I -(1 — x)t + 



y - 20(ar) \ 

2a; J 



dtdx 



( 13 ) 
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and 



/: 



P{y)dy = 1, 



yP{y)dy = 0. 



(14) 



(ii) The left tail of the distribution satisfies 

2 1 



Pr[£„ - E(£„) < nz] 



exp I —a exp 



(3-z 



2 - log“^ 2 



7T v'2 log 2 - 1 

for n ^ 00 , 2 ^ — 00 wftere a = ^e°og 2 ^ = 0.205021 . . . and (3 is a constant. 
(iii) The right tail becomes 

^ («J,-5+27+log 2) 



(15) 



Pr[£„ - E(£„) > ny] 



— 1 

rw. 2e« 



X ex' 



:p[-yw;* + 



du] 



(16) 



for n — >• 00 , y — >• +oo. iJere C* is a constant, 7 is Euler’s constant, while 
w* = w^,(y) is the solution to 



y= — exp(te*) 



(17) 



that asymptotically becomes 
for y^ 00 (cf (51)). 



Finally, we relate our results for the tails to those of McDiarmid and Hayward 
[13] . These authors showed that 

Pr[|£„-E(£„)| > sE{Cn)\ = exp[-2elogn(loglogn-log(l/e)+0(logloglogn))], 

(19) 

which holds for n — > 00 and e in the range 



logn 



< £ < 1 . 



( 20 ) 



As pointed out in [13], this estimate is not very precise if, say, e = O (log logn/ logn). 
Prom Theorem 1 we conclude that (since the right tail is much thicker than the 
left tail) 

Pr[|£„ - E(£„)| > ny] ~ Ck{y)e^^y\ y ^ 00 (21) 

where C is a constant and 



<P{y) = -yw* + 



to, 2 e« 2 e"’» 

du, y = 



U)* 



( 22 ) 



k(y) 



-27-log 2) 



W: 



,\/Wt, - 1 
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We have not been able to determine the upper limit on y for the validity of 
(21). However, it is easy to see that (21) reduces to (19) if we set y = eE(£„)/n = 
2elogn+ +e (27 — 4) + 0(e(logn)/n) and use (18) to approximate te* as t/ ^ cxd. 
This yields 



-yw^ + = y[- log (I) - log log (I) + 1 + o(l)] (23) 

= — 2elogn [loglogn — log(l/£) + log log(e log n) — 1] +o(logn) 

which agrees precisely with the estimate (19), and also explicitly identifies the 
O(logloglogn) error term. This suggests that (21) applies for y as large as 
21ogn, though it cannot hold for y as large as n/2 in view of the fact that 
Pr[£„ = A:] = 0 for k > (”)• An important open problem is obtaining an 
accurate numerical approximation to the constant C. This would likely involve 
the numerical solution of the integral equation for Gq(ui). 

3 Analysis 

We study (4) asymptotically, for various ranges of n and u, and then we evaluate 
the tails of the distribution. 

First we consider the limit u 1 with n fixed. Using the Taylor expansion 
Ln(u) = 1 + An(u - 1) + Bn(u - 1)2 + 0((u - 1)3) (24) 

= _ 1)2 ^ _ 1)3)] 

we find from (4) that = L'^(l) and = T"(l)/2 satisfy the linear recurrence 
equations 



An+i = n-\ — ~S^[Ai + An-i] = n H — A*; Ao = 0, (25) 

n + 1 n + 1 

^=0 ^=0 

cy n w n 

B„+i = + - + V[2Bi + AA„_i]; Ho = 0. (26) 

\ L ' Ti ~r i. , Ti ~r i. , 

^=0 z=0 

These are easily solved using either generating functions, or by multiplying (25) 
and (26) by n + 1 and then differencing with respect to n. The final result is (cf. 
[12, 14]) 

An = 2(n + l)i^„ - 4n (27) 

Bn = 2(n + 1)2/72 - (8n + 2)(n + !)//„ + |(23n + 17) - 2(n + (28) 

Here = 1 + ^ + |h + the harmonic number, and Hn^ = J2k=i 

is the harmonic number of second order. In terms of An and Bn, the mean and 
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variance of are given by 

E[£„] =An = 2(n + l)Hn - 4n (29) 

Var [Cn] = i"(l) + L'^il) - = 2B„ + An~Al (30) 

= - 2{n + l)Hn + 13n - 4(n + 

Asymptotically, for n oo, we also obtain 

5 

An = 2nlogn + (2y — 4)n + 21ogn + 2y + 1 + + 0(n~^) (31) 

^n-^Al = ^n= -2nlogn + n (y |7 t^^ +o(n). (32) 

These expressions will be used in order to asymptotically match the expansion 
for n — 1 and n fixed, to those that will apply for other ranges of n and u. 

Next we consider the limit « ^ 1, n — ^ oo with w = n{u — 1) held fixed. We 
define G(-) by 

Ln{u) = exp{Anw/n)G{w,n) = — l);n). (33) 

Prom (33) and the identity T(j(l) = = E[£„] we find that for all n 

G(0;n) = 1 and G'(0;n) = 0. (34) 

We assume that for n — ^ oo, G{w;n) has an asymptotic expansion of the form 

G{w; n) - Go {w) + ai (n)Gi (w) + a -2 (n)G 2 (w) -\ (35) 

where aj{n) is an asymptotic sequence as n — >• oo, i.e., aj+i(n)/aj{n) — >■ 0 as 
n oo. We will eventually obtain 

/ , logn , , 1 

ai(n) = ; 02 (n) = -, (36) 

n n 

so we use this form from the beginning. Note that Go(tc) is the moment gener- 
ating function of the limiting density of (£« ~ E[£„])/n, which is discussed in 
[16] . The conditions (34) imply that 

Go(0) = l; Gi(0)=G2(0) = --- = 0 (37) 

G'(0)=Gi(0) = G'(0) = --- = 0. (38) 

In fact, in the full version of this paper [11] we prove that Gi{w) satisfy the 
following integral equations: 

e“”’Go(w) = / e^'f’^^^^Go(wx)Go(w — wx)dx, (39) 

Jo 

fi 1 

e-“Gi(w) = 2 / -c2'^(®)"'Go(m; - wx)Gi{wx)dx, 

Jo X 



( 40 ) 
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e "'[Go(w) + ^td^Go(w) + wGo(w)) + G 2 (w)] = Go(w) (41) 

+ w f [2tp{x) + 3]e^'^^^^^Go{wx)Go{w — wx)dx 
Jo 

+ 2 / — wx)Gi (wx)dx 

Jo ^ 

fi 1 

+ 2 / — wx)G2{wx)dx. 

Jo X 

Equations (39)-(41), along with (37) and (38) are integral equations for the first 
three terms in the series (35). The leading order equation (39) was previously 
obtained in [5], using probabilistic arguments. 

Now, we shall examine the leading term Go{w), and we first consider the limit 
w — 00 . Asw ^ — 00 , the “kernel” exp[2w(/)(a;)] in (39) is sharply concentrated 
near x — 1/2, and behaves as a multiple of 1/2). Thus we treat (39) 

as a Laplace type integral (cf. [8]). The case w — ^ 00 can be treated in a similar 
manner. We prove the following asymptotic expansions (see [11] for details) 



Go(tn) 



2y/2 

v/tt log 2 



exp 



log 2 



2 ) iclog)— in) 



Go(m) ~ G* exp 



w — 00 , (42) 
in — >• 00 . (43) 



Finally, we analyze the tails of the limiting distribution. Using the approxi- 
mation (33) for It — 1 = 0{n~^), we obtain 



Pr[£„ - E(£„) = ny] = ^J^z~^^y+^-+^^Ln{z)dz 



(44) 



1 J_ 

n 2-Ki Jb^ 



, [ e-y^Go(w)dw = -P{y) 
i J n 



where Br = (c — lOO, c -I- ioo) for some constant c, is any vertical contour in the 
in-plane. Here we have set 2 = 1 -I- in/n in the integral. It follows that 

/ oo 

e^yp{y)dy (45) 

-OO 



so that Go (in) is the moment generating function of the density P{y). In view 
of (37) and (38) we have P{y)dy = 1 and yP(y)dy = 0. 

Observe that, using (33), (35), and (36), we can refine the approximation 
(44) to 

Pr[£„ - E(£„) =ny] = ^ (^P(i/) + ^[Pi(i/) + P"(i/)] 

+ ^[P2(y) + P'{y) + \yp"{y) + (7 - 2)P"{y)] + o(n-i) j 




354 



C. Knessl and W. Szpankowski 



where 



for k — 1, 2. 

An integral equation for P{y) can easily be derived from (39). We multiply 
(39) by j (2 tti) and integrate over a contour Br in the u)-plane: 



P(?/+ 1) = ^ / e-"'(*'+i)Go(w)dw 



Jo J-CO 



Br 

00 



P [ xt + 



y/2 - (t){x) 



-(1 - x)t + 



y/2 - (t>{x) 



(46) 

dtdx, 



where we used the well known identity ^ fs^e~y^dw = 6{y), where 6{y) is 
Dirac’s delta function (cf. [1]). The last expression is precisely (13). The solution 
to this integral equation is not unique: if P{y) is a solution, so is P{y + c) for 
any c. 

We now study P{y) = (27ri)“^ e~y^Go{w)dw asy ^ ±oo. We argue that 
the asymptotic expansion of the integral will be determined by a saddle point 
w = s{y), which satisfies s{y) ±oo as y ^ ±oo. Thus for y — oo, we can 
use the approximation (42) for Go{w), which yields 



P(y) 



-[ 

Jb, 



2V2 



we^P-y> exp 



27ri Jbt log 2 
This integrand has a saddle point where 

1 



log 2 



d 

dw 



-iy-/3)w + 



log 2 



— 2 ) wlog{-w) 



2 ] tclog(— tc) 



= 0 , 



dw. (47) 



so that 



1 

w = — exp 
e 



y- P 

2- l/log2 



= w{y) 



which satisfies w{y) — ^ —00 as y ^ — 00 . Then the standard saddle point ap- 
proximation to (47) yields 



P{y)-^ 



2a/2 

Vtt log 2 



-we ^^""exp 




w log(— w) 



■u 



exp 



Br 



ire V21og2 — 1 [2 — l/log2 






log 2 



dw 



(48) 



l3-y 



2 - l/log2 



exp 



l3-y Y 

2-l/log2/ 



for y — 00 . Thus, the left tail is very small and the behavior of P{y) as y — ^ 
—00 is similar to that of an extreme value distribution (i.e., double-exponential 
distribution). 
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Now take y — +oo and use (43) to get 
P{y) ~ 7 ^ / e~y^C, exp ( r —du) 

2TTt Jbt \Jl u J w 

The saddle point now satisfies 



d 

dw 




2e^ 



du 



= 0 



(49) 



oi y = 2e“/te. Let w* = w*{y) be the solution to (17) that satisfies in* ^ oo as 
y ^ 00 . Then expanding the integrand in (49) about w = w*(t/) and nsing again 
the saddle point approximation yields 



Piy)-^ 



C* ^/y 

2\^ i/l - l/w* 



-yw* + 




2e^ 



du 



exp[-te* - ( 27+2 log 2 )te*] 



(50) 

as y — ^ 00 , from which (16) easily follows. Thus for y — ^ 00 we have P{y) = 
exp[ 0 (— j/logt/)] and hence the right tail is thinner than that of an extreme 
value distribution. From (17) it is easy to show that 



(I),. a. »(!)), ( 5 .) 

For fixed and y we have, as n ^ 00 , 



Pr[Cn - E(£„) < nz] ~ 


[ P{y)dy 

J — 00 


(52) 


Pr[£„ - E(£„) > ny] ~ 


poo 

/ P(u)du. 
Jy 


(53) 



li z ^ —00 or y ^ + 00 , then these integrals may be evaluated asymptotically 
using (48) and (50), and we obtain the results (15) and (16), respectively. 
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Abstract. We study dense instances of optimization problems with 
variables taking values in p. Specifically, we study systems of functions 
from p to p where the objective is to make as many functions as pos- 
sible attain the value zero. We generalize earlier sampling methods and 
thereby construct a randomized polynomial time approximation scheme 
for instances with Q{v!°) functions where n is the number of variables 
occurring in the functions. 



1 Introduction 

Arora, Karger and Karpinski [1] have constructed a randomized polynomial time 
approximation scheme for dense instances of a number of Max-SNP problems, 
including Max Cut. They formulate the problems as integer programs with cer- 
tain properties, and then construct an algorithm finding, in probabilistic polyno- 
mial time, a solution accurate enough to give a relative error of e, for any e > 0. 
Fernandez de la Vega [3] has also constructed a randomized polynomial time 
approximation scheme for dense instances of Max Cut, independently of Arora, 
Karger and Karpinski. It is natural to look for generalizations of these ideas 
to other problems; for instance one can turn to problems with variables taking 
values in Zp rather than Z 2 . The method of Arora, Karger and Karpinski [1] 
does not seem to apply in this case since the integer programs used to express 
such generalizations do not have the properties required by the method. 

The algorithm of Fernandez de la Vega [3] selects a random subset W of the 
vertices in the graph G = (V,E). This subset has constant size. Then, V \ W 
is partitioned randomly into smaller sets. These sets are used to construct a 
cut in G by exhaustive search. Finally, it turns out that the randomly selected 
subset W has, with high probability, the property that the cut found by the 
exhaustive search is close to the optimum cut in dense graphs. Goldreich, Gold- 
wasser and Ron [5] generalize these ideas to several other problems, and express 
the key idea somewhat differently. In their randomized polynomial time approx- 
imation scheme for Max Cut, they partition the vertices of the graph G = (V, E) 
into a constant number of disjoint sets V®. For each i they find a cut in V® by 

M. Luby, J. Rolim, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 357-368, 1998. 

© Springer-Verlag Berlin Heidelberg 1998 
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selecting a small subset U* of the vertices in V \ V^ . Then they try all possible 
partitions tt of f7* into two parts. Each partition tt induces a cut in E®. Finally, 
when 7T is exactly the partition from the maximum cut restricted to the subset in 
question, the weight of the induced cut should, with high probability, be close to 
the weight of the maximum cut. In this paper, we continue this line of research 
by generalizing the above ideas to arbitrary non-boolean optimization problems. 

Frieze and Kannan [4] have constructed a polynomial time approximation 
scheme for all dense Max-SNP problems. Their algorithm is a polynomial time 
approximation scheme for every problem that can be described as an instance of 
Max fc-Function Sat with 0[n^) functions. Dense instances of Max fc-Function 
Sat mod p do not seem to be describable in this manner, and on top of that, the 
algorithm proposed in this paper has a simpler structure and shorter running 
time than their algorithm. 

2 Preliminaries 

Definition 1. We denote by Max Ek-Lin mod p the problem in which the input 
consists of a system of linear equations mod p in n variables. Each equation 
contains exactly k variables. The objective is to find the assignment maximizing 
the number of satisfied equations. 

Definition 2. We denote by Max k-Function Sat the problem in which the input 
consists of a number of boolean functions in n boolean variables. Each function 
depends on k variables. The objective is to find the assignment maximizing the 
number of satisfied functions. 

Definition 3. We denote by Max k-Function Sat mod p the problem in which 
the input consists of a number of functions Zp inn variables. A function is 

satisfied if it evaluates to zero. The objective is to find the assignment maximizing 
the number of satisfied functions. 

Definition 4. The class Max-SNP is the class of optimization problems which 
can be written on the form 

max| {a; : 5, a;)} I , (1) 

where is a quantifier-free formula, I an instance and S a solution. (This class 
is called Max-SNPo in [9].) 

For instance, the Max Cut problem is in Max-SNP since it can be described as 
max|{(a:,2/) : E{x,y) AS{x) A^S{y)}\, (2) 

where E{x,y) is true if there is an edge (x,y) in the graph. 

Definition 5. A polynomial time approximation scheme for a maximization 
problem P with objective function m{-) is a family e > 0, of algorithms 

with polynomial running time (for fixed s) such that m{A^[I)) > (1 — £)opt(/) 
for all instances I of P, where opt(/) is the optimal value of the instance. 
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In this paper we consider randomized polynomial time approximation schemes 
where > (1 ~ e) opt(/) holds with probability at least 1 — <5. 

3 Systems with two variables per equation 

Max E2-Lin mod p is the most natural subfamily of Max fc-Function Sat mod p, 
and for clarity we will first describe the polynomial time approximation scheme 
and prove the results in this setting. 

We consider an unweighted system of linear equations modulo some prime p. 
There are n different variables in the system. The equations are of the form 

axi + bxi^ = c (3) 

where i ^ i' , a, b & Z*, and c G Zp. We assume that there are no equivalent 
equations in the system. I.e., if the two equations 

aXi + bxi' = c (4) 

a'xi + b'xi! = d (5) 

both are in the system, we assume that there is no d e Zp such that a = da' , 
b = db' and c = dc' . We think of variable assignments as functions from the set 
of variables to Zp. 

Definition 6. Denote by S{X,t,x <— r) the number of satisfied equations with 
one variable from the set X and x as the other variable, given that the variables 
in X are assigned values according to the function t and x is assigned the value r. 

The algorithm we use is based on the Max Cut algorithm by Goldreich, Gold- 
wasser and Ron [5] , and we use their terminology and notation. The parameters 
I and t are independent of n. They will be determined during the analysis of the 
algorithm. 

Algorithm 1 A randomized polynomial time approximation scheme for Max 
E2-Lin mod p with running time 0{n^). 

1. Partition the variable set V into I parts V^, . . . , V^, each of size njt. 

2. Choose £ sets U^, . . . ,U^ such that IP is a set of cardinality t chosen uni- 
formly at random from F \ F®. Let U = lJi=i 

3. For each of the (at most) p^' assignments n: U ^ Zp, form an assignment 

•. V ^ Zp in the following way: 

(a) Let 7t' = 7T. 

(b) Forie {!,... ,£}, 

(c) For each v e V , 

(d) Let j*{v) be the j G Zp which maximizes S{IP,tt',v <— j). 

(e) Define LIt^{v) = j*{v). 

(f) Modify 7t' such that Tr'lyi = 77® . 

4- Let LI be the variable assignment 11^, which maximizes the number of satisfied 
equations. 
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5. Return 77. 



Our overall goal is to show that it is possible to choose the constants 7 
and t in such a way that Algorithm 1 produces, with probability at least 1 — i5, 
an assignment with weight at least 1 — ejc times the weight of the optimal 
assignment for instances with cn^ equations. In the analysis we will use the 
constants s\, £2 and £3. They are all linear in £, and will be determined later. 

The intuition behind the algorithm is as follows: Since the graph is dense, the 
sets t/* should in some sense represent the structure of O \ O*. If we pick some 
variable v from and some assignment to the variables in \ 17® we will, for 
each assignment v ^ j, satisfy some fraction (j)j of the equations containing v 
and one variable from V\V^ . We then expect W to have the property that the 
fraction of the satisfied equations containing v and one variable from 77® should 
be close to cj)j . It turns out that the decrease in the number of satisfied equations 
due to the sampling is 0{n^} which implies that the algorithm is a polynomial 
approximation scheme only for dense instances of the problem. 

Let us now formalize the intuition. From now on, we fix an assignment H to 
the variables in V and a partition of V into the i parts V^, . . . ,V^. All definitions 
and results proven below will be relative to these fixed properties, unless stated 
otherwise. 



Definition 7. We say that the set t/® is good if for all except a fraction of at 
most El of the variables v eV' 



S{U\H,v^j) 

t 



S{V\V\H,v^j) 
n- |17»| 



< £2 



for all j £ Zp. 



( 6 ) 



Remark 1. What we call a good set is essentially what Fernandez de la Vega [3] 
calls a representative set. 



Lemma 1. For any d > 0, it is possible to choose the constant t in such a way 
that the probability of a set IT^ being good is at least 1 — Sfi for a fixed i. 

Proof. Fix a variable w G V® and some j G Zp. Note that the assignment 77 
is fixed; the underlying probability space is the possible choices of 77®. We now 
introduce, for each w G V \ V®, a Bernoulli random variable with the 

property that 



fi.j.v.w — 



1 if w G U\ 
0 otherwise. 



( 7 ) 



We can use these random variables to express the number of satisfied equations 
containing v and one variable from 7/®. 

S{U\H,Vi~j)= ^ S{{w},H,v ^ 
wev\V' 



( 8 ) 
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Since f7* is chosen uniformly at random from \ , 

= 1 ] = ■ 

Now we construct the random variable 

S{U\H,v^j) 






t 



From Eqs. 8 and 9 it follows that 

S{{w},H,v^j) S{V\V\H,v^j) 



E[Xi,j.v]= E 

weV\V' 



n- IP* 



n- IP* 



(9) 



( 10 ) 



( 11 ) 



which means that we are in good shape if we can bound the probability that 
deviates more than £2 from its mean. At a first glance, this seems hard to do. For 
is a linear combination of dependent random variables, and the coefhcients 
in the linear combination depend on the assignment H and the instance. Since 
there are, for each w & V \ P*, at most p{p — 1) equations containing v and w, 
5({w}, H,v j) can be anywhere between 0 and p—l, which complicates things 
even worse. Fortunately, we can use martingale tail bounds to reach our goal in 
spite of our limited knowledge of the probability distribution of Xij^y. Since the 
sets U* have cardinality t, exactly t different are non-zero, which means 

that the sum in Eq. 8 can be written as 



S{U\H,v^j) = Y,Zk. ( 12 ) 

fc=i 



Each random variable Zk corresponds to S{{wk}, H,v ^ j) for some Wk G P\P*, 
which implies that 0 < Zj, < p — 1. Thus, the sequence 



X, 






= E 



Z)fc=i Ek 



Zi, Z2, ■ ■ ■ , Zy 



for m = 0, 1, 



(13) 



is a martingale [8] with the properties that 

= \XEy - XXy\, (14) 

\Kpv - X^r; \<(p- l)/t for all m G {1, ... , t}, (15) 

which enables us to apply Azuma’s inequality [2,7,8]: 

Pr[|Aij> - E[Xi,y„]| > £ 2 ] < (16) 

The above inequality is valid for fixed i, j and v. A set 17* is good if the above 
inequality holds for all j and for all but ei of the vertices in P*. Thus we keep i 
and j fixed and construct a new family of Bernoulli random variables 






1 if - E[Xij>]j > £ 2 , 

0 otherwise. 



( 17 ) 
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and set 



By Eq 16, 



and thus 



- \yi\ 



vev* 



= 1] < 2e-^'*/2(p-i)^ 






( 18 ) 



(19) 

( 20 ) 



We can now use Markov’s inequality to bound 



Pr[Fij > £i] < 



MM 

£l 



< 



2g-£^t/2(p-l) = 

£i 



( 21 ) 



The above inequality is valid for a fixed j, and it can be used to obtain the 
probability that U® is good: 



Pr[17® is good] = Pr 



n < £1 

i=i 



> 1 - 



2pe' 






Si 



( 22 ) 



Finally, we are to determine a suitable value for t, in order to make this proba- 
bility large enough. If we choose 



So dei 



(23) 



the probability that C/® is good will be at least I — 8/1. 



Corollary 1. For any 8 > 0 it is possible to choose the constant t in such a way 
that the probability of all C® being good is at least 1 — d. 

Proof. There are £ different C®. 

We construct an assignment 77 to the variables in P® by step 3 in Algorithm 1. 
If 7/® is good, we expect the number of equations satisfied by 77 to be close to 
the number of equations satisfied by 77. To formalize this intuition, we need the 
following definitions. 

Definition 8. 

, . # equations satisfied by the assignment t . 

p(t) = 2 ■ ( 24 ) 



Definition 9. We say that a variable v e is unbalanced if there exists a j* e 
Zp such that for all j' e Zp \ {j*} 



S(V\V\H,v<~r) S(V\VfH,v^f) 

n-|P®| - n-|P®| 



( 25 ) 
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Lemma 2. Let tt = H\u and IT be the assignment produced with this choice 
of TT in step 3 of Algorithm 1. Denote by H' the assignment which assigns to a 
variable v the value H{v) if v e V \ V'' and II{v) if v e V\ Then, ifW is good, 
it is for any e > 0 possible to choose the constant £ in such a way that 

pi{H’)>pL{H)-e/p£. (26) 



Proof. We want to compare the number of equations satisfied by the assign- 
ment H to the number satisfied by H' . In particular, we want to bound the 
decrease in the number of satisHed equations. As only the values assigned to 
variables in can differ between the two assignments, the possible sources of 
aberrations are the equations where variables in are involved. We have four 
different cases: 



1. Equations of the type av\ + hv2 = c, where Vi,V2 G E*. There are less than 
p{p — l)n^/ 2 £^ such equations, and at most {p — \)nf /2!? can be satisfied 
by any given assignment. The decrease is thus less than pr? . 

2. Equations of the form av + bw = c where u G E® is unbalanced and satisfies 
Eq. 6 , and w ^ E®. If we combine Eq. 6 and Eq. 25 we obtain the inequality 



S{W 



t 



> 



S{W 



Tl,V ^ J 



t 



-I - £3 - 2 e 2 for all j'. (27) 



By the construction of Algorithm 1, the value chosen for v will thus be the 
correct value, provided that £3 > 2 s 2 . It follows that the number of satisfied 
equations of this type cannot decrease. 

3. Equations of the form av + bw = c where u G E® is not unbalanced, but 
still satisfies Eq. 6 , and in ^ E®. In this case. Algorithm 1 may select the 
wrong assignment to v. However, that cannot decrease the optimum value 
by much. For, suppose that v = j in the optimal solution, but the algorithm 
happens to set v = f. The reason for that can only be that S'(C/®,7r,u <— 
f) > S{U'^,TT,v r- j). By Eq. 6 , this implies that 



\S{V\V\H,v^f) S{V\V\H,v^j) 



n- IE® 



n — |E® 



< 2E2. 



(28) 



Since there are at most |E®| different v that are not unbalanced, we can 
bound the decrease in the number of satisHed equations by 



|E®| (^(E \ E^ H, V ^ f) -S{V\V\H,v^ j)) < 2e2n^li. (29) 

4. Equations of the form av + hw = c where u G E® does not satisfy Eq. 6 , 
and w ^ E®. By construction there are at most £r|E®| such variables in E®. 
The number of equations of this type is thus less than £ip^|E®|n. Only 
£ip\V'^\n of these can be satisfied at the same time, and hence a bound 
on the decrease is £ip|E®|n = e\pn^£. 

Summing up, the total decrease is at most 

pr^ jl!? -I- 2£2n^ ft £\pn^£. 



(30) 
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If we select ^ si = e/Ap, £2 = ^/8, and £3 = £/3, the total decrease is at 

most 



ev? 1 2pt + er? jApl + ev? /Apl = s'n} j pi, (31) 

which concludes the proof. 

Corollary 2. // all are good and we construct from an assignment tt = H\ u 
a new assignment 77 as in step 3 of Algorithm 1, it is for all e > 0 possible to 
choose the constant £ in such a way that p{IT) > p{H) — e/p. 

Proof. Let Hq = H and Hi, i = 1,2, . . . ,£, satisfy Hi\yi = n\yi and Hi\y\yi = 
Hi-i\y\yi. Apply Lemma 2 £ times, once for each Hi. This corresponds to the 
way Algorithm 1 works. 

The only observation needed now is that since all results above are valid for 
any choice of the assignment 77, they are in particular valid when 77 is the 
assignment producing the optimum number of satisfied equations. 

Theorem 1. For instances of Max E2-Lin modp where the number of equations 
is 6>(n^), Algorithm 1 is a randomized polynomial time approximation scheme. 

Proof. Consider the case when the assignment 77 is the optimal assignment. 
Since all possible assignments tt to the variables in the set U are tried by the 
algorithm, the optimal assignment 77 restricted to U will eventually be tried. 
Corollaries 1 and 2 show that the algorithm produces, with probability at least 
1 — (5, an assignment 77 such that p{H) > p{H) —e/p. An additive error of e/p in 
/i(r) translates into an additive error of er? /p for the equation problem. Since 
the optimum of a Max E2-Lin mod p instance with cr? equations is at least 
cn^ /p, this gives a relative error of at most e/c. 

4 The general case 

The algorithm described in the previous section is easily generalized to handle 
instances of Max fc-Function Sat mod p as well — it does not use any special 
feature of the Max E2-Lin mod p problem. As for Max E2-Lin mod p, we as- 
sume that the set of functions does not contain any redundancy — all functions 
are assumed to be different. This is actually a weaker constraint than the one 
imposed on Max E2-Lin mod p; in the context of Max fc-Function Sat mod p 
problems, axi -I- bxy — c and adxi + dbxy — dc (for d ^ {0, 1}) are considered dis- 
tinct functions whereas the corresponding Max E2-Lin mod p equations would 
be considered identical. 

The analysis assumes that all functions in the instance are satisfiable. This 
is needed to ensure that the optimum of an instance with cn^ functions is at 
least cn^ jp^ . 

We can adopt the techniques used in the proofs for the Max E2-Lin mod p 
case to this more general case with some minor modifications. 




Sampling Methods Applied to Dense Instances of Non-Boolean Optimization Problems 



365 



Definition 10. We extend the notation S{X,t,x <— r) to mean the number of 
satisfied funetions with k — 1 variables from the set X and one variable x ^ X , 
given that the variables in X are assigned values according to the function r and 
X is assigned the value r. 

Definition 11. We say that the set W is good if for all except a fraction of at 
most El of the variables v eV’ 



S{U\H,v^j) S{V\V\H,v^j) 



L-,) 






< S 2 for all j e Zp 



(32) 



Lemma 3. For any i5 > 0, it is possible to choose the constant t in such a way 
that the probability of a set IT^ being good is at least 1 — S/£ for a fixed i. 

Proof (sketch). Fix a variable v and some j e Zp. If we introduce, for each 
w <ZV \ V^ such that |to| = fc — 1, a Bernoulli random variable with the 

property that 






1 if ut C U\ 
0 otherwise, 



we can write 



S{U\H,v^j)= ^ S{w,H,v ^ 

wCV\V' 

\'w\=k—l 



(33) 



(34) 



There are pP functions from Zp to Zp and a fraction 1 (p of these are satisfied 
simultaneously, which implies that 



Q < S{w,H,v j) <pF 
To simplify the notation, we put 



T = 





As in the proof of Lemma 1, we define 



XiJ^v — 



S{U\H,v^j) 

T 



(35) 



(36) 



(37) 



The sum in Eq. 34 contains T non-zero terms. We construct a martingale 
XX . , Xf^ ^ by conditioning on these terms, as in the proof of Lemma 1. 
The Lipschitz condition in Eq. 15 then changes to 



\Ki,v - XZ:v \ < ~ZT for all m e {1, . . . , T}. 



(38) 
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By choosing 



t ^ k — 2 “b 



' 2{k - 




(39) 



it can be verified that the probability that U® is good is at least 1 — 6/£. 

Lemma 4. Let tt = H\u and U he the assignment produced with this choice it in 
step 3 of Algorithm 1. Denote by H' the assignment which assigns to a variable v 
the value H{v) if v e V \ V^ and ll{v) if v e V'’ . Then, if U® is good, it is for 
any e > 0 possible to choose the constant £ in such a way that 

p{H') > p{H) - e/pH. (40) 



Proof. To bound the decrease in the number of satisfied functions, we perform 
a case analysis similar to that in the proof of Lemma 2. 



1. Functions which depend on more than one variable from T^®. At most 



-Ink 



£^ 



(41) 



such functions can be satisfied by any given assignment. 

2. Functions depending on u G y® but not on any other variable in where v 
is unbalanced and satisfies Eq. 32. As in the proof of Lemma 2, the number 
of satisfied functions of this type does not decrease if £3 > 2 e 2 - 

3. Functions depending on u e h®® but not on any other variable in where v 
is not unbalanced but satisfies Eq. 32. In this case Algorithm 1 can select the 
wrong assignment to v. Suppose that v = j in the optimal solution but that 
the algorithm sets v = f . This implies that 5(1/®, tt, u f) > S'(17®, n,v ^ 
j) and by Eq. 32 



S{V\V\H,v^f) S{V\V\H,v^j) 

Since there are at most |E®| different v that are not unbalanced, we can 
bound the decrease in the number of satisfied functions by 

\V%S{V\V\H,v^j')-S{V\V\H,v^j))<^^^. (43) 

4. Functions depending on u G E® but not on any other variable in E® where 
V does not satisfy Eq. 32. By construction there are at most £i|E®| such 
variables in E®. The number of functions of this type is thus less than 
£i|E®|p^®*n^“^/(fc— 1)!. Only £i|E®|pP*“^n*“^/(fc — 1)! = sipH / £[k—l)\ 
of these can be satisfied at the same time, which gives us a bound on the 
decrease. 
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Summing up, the total decrease is 



By choosing 



pP ^ 2e2U* ^ 

^2 + [k-l)U ^ {k-l)U ■ 



P, = 2pP"+’^-^/e, 

£i =e(fc- l)!/4pP'“+'=-\ 
£2 = e{k - l)!/8p'', 

£3 = e{k - l)!/3p'', 



(44) 



(45) 

(46) 

(47) 

(48) 



the total decrease becomes at most en^ /p^i. 

Theorem 2. For instances of Max k-Function Sat mod p where the number 
of satis fiable functions is 0{n^), Algorithm 1 is a randomized polynomial time 
approximation scheme. 



Proof. Follows that of Theorem 1 with the only difference being that the opti- 
mum of a Max fc-Function Sat mod p instance with cn* satisfiable functions is 
at least cn^ jp^ . 



5 Conclusions 

We have shown how to construct a randomized polynomial time approximation 
scheme for dense instances of Max fc-Function Sat mod p. The algorithm is 
intuitive, and shows the power of exhaustive sampling. The running time of the 
algorithm is quadratic in the number of variables, albeit with large constants. 
Using methods similar to those of Goldreich, Goldwasser and Ron [5], we can 
convert our algorithm into a randomized constant time approximation scheme. 
The algorithms in this scheme only compute numerical approximations to the 
optimum, they do not construct assignments achieving this optimum. 

As a special case. Theorem 2 implies the existence of a polynomial time 
approximation scheme for dense instances of Max E3-Lin mod p. This result is 
interesting when compared to the lower bounds found by Hastad [6] for systems 
of equations with at least 3 variables in each equation: In the general case, there 
is no polynomial time approximation algorithm achieving a performance ratio 
of p — £ for any e > 0 unless P = NP. Zwick [10] studied the problem of finding 
almost-satisfying assignments for almost-satisfiable instances of some constraint 
satisfaction problems. Also for this restricted problem, approximating Max Efc- 
Lin mod 2 (in the sense defined by Zwick) is hard by the results of Hastad [6] . 

It is well known [9, Theorem 13.8] that a Max-SNP problem can be viewed 
as a Max fc-Function Sat problem for some fixed integer k. Arora, Karger and 
Karpinski [1] call an instance of a Max-SNP problem dense if the instance of 
Max fc-Function Sat produced using it has Q{n^) functions. We have shown in 
this paper that there are natural extensions of these concepts to functions mod p. 
This extends the notion of denseness also to non-boolean optimization problems. 
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Abstract. We study local-control algorithms for maximum flow and multicommodity 
flow problems in distributed networks. We propose a second-order method for acceler- 
ating the convergence of the “first-order” distributed algorithms recently proposed by 
Awerbuch and Leighton. Our experimental study shows that second-order methods are 
significantly faster than the first-order methods for approximate single- and multicom- 
modity flow problems. Furthermore, our experimental study gives valuable insights into 
the diffusive processes that underly these local-control algorithms; this leads us to iden- 
tify many open technical problems for theoretical study. 



1 Introduction 

The multicommodity flow problem is the problem of simultaneously shipping multiple com- 
modities through a capacitated network such that the total amount of flow on each edge is 
no more than the capacity of the edge. Each commodity i has a source node, a sink node, 
and an associated demand di, which is the amount of that commodity that must be shipped 
from its source to its sink. The objective is to find a flow that meets the individual demands of 
all the commodities without exceeding any edge capacity (finding a. feasible flowfl. The case 
when there is only a single commodity and the goal is to maximize the feasible flow is the 
well known maximum flow problem. The importance of the single- and multicommodity flow 
problems need hardly be stressed - a substantial body of work in Algorithms and Operations 
Research is devoted to these problems. 

In this paper, we focus on local-control (or distributed) algorithms for the single- and mul- 
ticommodity flow problems. Besides their inherent interest, local-control algorithms for these 
problems are relevant because of the following reasons: 

(1) Many routing, communication, and flow-control problems between multiple senders and 
receivers, including various uni/broad/multicasts, can be modeled as multicommodity flow 
problems on networks (e.g., see the references in [BG91, BT89, AL93, AL94, AAB97]). 
These applications typically require online, local-control (distributed) algorithms, since 
global communication and control is expensive and cumbersome. Local algorithms for 

An alternate objective is to maximize z such that the flow satisfies a percentage z of every demand 
without exceeding any edge capacity (called the concurrent flow problem [SM86]); we do not consider 
this version here. 

M. Luby, J. Rolim, and M. Serna (Eds.): RANDOM’98, LNCS 1518, pp. 369-384, 1998. 
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multicommodity flow not only provide a generic solution to these problems, but they also 
give valuable insights for the centralized/global solution of these problems. 

(2) The best currently known algorithms for maximum flow and multicommodity flow prob- 
lems are fairly sophisticated (see, e.g., [GR97, GT88, K97, LMh- 91, V89]), and typically 
rely on augmenting paths, blocking flows, min-cost flows, or linear programming. In con- 
trast, local-control algorithms are appealingly simple, as they rely on simple “edge balanc- 
ing” strategies of appropriately balancing commodities between adjacent nodes (details 
below). Thus, they are easy to implement, understand, and experiment with. 

(3) Local-control algorithms have several other attractive features. For example, they adjust 
gracefully to dynamic changes in the topology (e.g., link failures) and the traffic demands 
(e.g., bursty multicasts) in communication networks. They are iterative, that is, running 
them longer gives progressively better approximations to the optimal solution. Hence, one 
can use them either for rapid coarse solutions or for slow refined solutions. Finally, they 
may expose alternate structure in the problem, as the convergence of such local-control al- 
gorithms is typically related to the eigenstructure of the network (for intuition, see [C89]). 

1.1 First-Order Algorithms 

Local-control algorithms for the multicommodity flow problem were recently designed by 
Awerbuch and Leighton [AL93, AL94]. Their algorithms proceed in parallel rounds. At the 
start of a round, (approximately) di units of commodity i are added to the source node of that 
commodity, where di is the demand of commodity i. The commodities accumulated in each 
node are then distributed equally among the local endpoints of the incident edges, and flow is 
pushed across each edge of the network so as to “balance” each commodity between the two 
endpoints of the edge (subject to edge capacity constraints). Finally, any commodity that has 
arrived at the appropriate sink is removed from the network. How to trade off the flow between 
different commodities that compete for the capacity of an edge is nontrivial. Awerbuch and 
Leighton proved in [AL93, AL94] that this simple “edge balancing” algorithm (and some of its 
variants) converges and, maybe somewhat surprisingly, that it provides a provably approximate 
solution to the multicommodity flow problem in a small number of rounds. 

We refer to such edge-balancing algorithms as first-order algorithms. The first-order al- 
gorithms in [AL93, AL94] can clearly be implemented on a distributed network in which 
each node communicates only with neighboring nodes and has no global knowledge of the 
network.® Similar local-control algorithms have been designed for several other problems 
[LW95], including distributed load balancing [C89, AA-i-93, MGS98] and end-to-end com- 
munication [AMS89]. 

A particularly simple local-control algorithm can be obtained for the case of the maxi- 
mum flow problem by specializing the first-order algorithm in [AL93, AL94] for the single- 
commodity case. There are many other algorithms for the maximum flow problem, but none 
that is a distributed first-order algorithm. The algorithm most closely related in spirit is the 
algorithm of Goldberg and Tarjan in [GT88], where a “preflow” is adjusted into a flow by 
pushing excess local flow towards the sink along estimated shortest paths. However, this al- 
gorithm needs to maintain estimated shortest-path information and is thus less amenable to a 
distributed, local-control implementation in dynamic networks. 

® In contrast, other approximation algorithms for the multicommodity flow problem rely on global com- 
putations [V89, LM-l-91]. 
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1.2 Second-Order Algorithms 

In this paper, we initiate a new direction in distributed flow algorithms aimed at speeding up 
the first-order algorithms of [AL93, AL94] for the multicommodity flow problem. The basic 
idea is that in any round, we use the knowledge of the amount of flow that was sent across the 
edge in the previous round in order to appropriately adjust the flow sent in the current round. 
Specifically, for a parameter (3, the flow sent across an edge is chosen as j3 times what would 
be sent by the first-order algorithm, plus (3 — 1 times what was actually sent across the edge in 
the previous round. (A more detailed description of these methods is given in Sections 3 and 

4.) 

We call algorithms derived in this manner second-order algorithms. Perhaps surprisingly, 
the main conclusion of this paper is that second-order algorithms appear to substantially out- 
perform their first-order counterparts for maximum flow and multicommodity flow problems, 
as shown by our experiments. 

1.3 Background and Related Work 

First-Order methods. The first-order algorithm of Awerbuch and Leighton for the maximum 
flow problem is conceptually similar to the probabilistic phenomena of diffusion and random 
walks. The algorithm works based on diffusion since the excess flow always flows down the 
gradient along each edge. For simpler problems such as distributed load balancing, if one con- 
siders the vector of flows accumulated at the nodes as iterations progress, they can be modeled 
as transitions of a Markov Chain, or a suitable random walk [C89]. However, for the general 
multicommodity flow problem, these conceptual similarities have not yet been formalized. 
The analysis of Awerbuch and Leighton is sophisticated even for the case of the maximum 
flow problem. It does not rely on Markov Chain methods, and is entirely combinatorial. 

First-order algorithms for flow problems are also related to matrix-iterative methods for 
solving linear systems, and in particular, the Gauss-Seidel iterations. This connection is made 
explicit in [BT89]. Also, there is a way to interpret the first-order algorithms as iteratively 
solving a dual network optimization problem involving a single variable per node. At each 
iteration, the dual variables of a single node or its incident edge flows are changed in an attempt 
to improve the dual cost. This process is also explained in [BT89]. 

Thus, there are intriguing connections between the first-order methods for flow problems 
and classical techniques such as matrix-iterative methods, diffusion, random walks and primal- 
dual relaxations. These techniques have been studied in different areas with somewhat different 
emphasis, but seem directly relevant to the work in [AL93, AL94]. 

Second-Order methods. Second-order algorithms, as described above, may seem ad-hoc, and 
further explanation is needed to motivate them. Our second-order algorithms are motivated 
by the observation that the first-order flow algorithms in [AL93] are iterative methods rem- 
iniscent of the matrix-iterative methods used for solving systems of linear equations. There 
is already a mature body of knowledge about speeding up these first-order methods (see, e.g., 
[A94, BB-h 93, HY81, Var62]). Very recently, these methods were explored for speeding up dif- 
fusive load-balancing schemes [MGS98]. Of the many known iterative techniques, the authors 
in [MGS98] identified a specific second-order scheme best suited for distributed implementa- 
tions, and our second-order scheme for the multicommodity flow problem is inspired by that 
method. 




372 



S. Muthukxishnan and T. Suel 



There are fundamental similarities between our work here and the work in [MGS98] for 
distributed load balancing, but there are fundamental differences as well. The basic similarity 
is that our algorithmic strategy for second-order methods relies on the same stationary accel- 
eration of the first-order method determined by a parameter [3 (fixed throughout all iterations) 
as that in [MGS98]. The main difference arises in the fact that the problem of multicom- 
modity flow is much more general than the distributed load-balancing problem considered in 
[MGS98]. First, the edges in our problem have capacity constraints, while the edge capacities 
are unbounded in the load-balancing problem. Second, our algorithms are dynamic in that they 
introduce new flow in each round as described in Section 3; in contrast, the total load remains 
unchanged in [MGS98]. There are other differences (such as the fact that we do not use lOUs 
as in [MGS98]), but we omit these details. 

The similarity of iterative flow algorithms to matrix-iterative methods and distributed load 
balancing is helpful. In particular, known results [Var62] show that 0 < /9 < 2 is the only 
suitable range for the convergence of that iterative method. Furthermore, from the results in 
[MGS98], we would expect that the second-order method will be outperformed by the first- 
order method for 0 < /3 < 1, and thus the fruitful range for /3 is (1,2); as we will see, 
this also holds for distributed flow problems.® However, the above mentioned differences ex- 
plain the considerable difficulty in analyzing the first-order and second-order method for the 
multicommodity flow problem [AL93, AL94]. The first-order method for distributed load bal- 
ancing can be analyzed fairly easily based on stationary Markov Chain methods [C89], and 
known second-order analyses for matrix-iterative methods can be fairly easily adopted to load 
balancing [MGS98]. However, standard approaches (e.g., based on Dirichlet boundary con- 
ditions [C97] for analyzing dynamic situations) do not seem to apply if edges have capacity 
constraints. 

1.4 Contents of this Paper 

In this paper, we propose second-order methods for accelerating the distributed flow algo- 
rithms proposed by Awerbuch and Leighton [AL93, AL94]. We perform an experimental study 
and show that the second-order algorithms are significantly faster than the first-order ones of 
[AL93, AL94] both for the maximum flow and the multicommodity flow problems. This is of 
possible applied interest as an online distributed solution for many routing problems arising in 
communication networks. Surprisingly, our algorithms seem to be of interest in the off-line, 
centralized context as well. While our algorithms are not as fast as the best known algorithms 
for the maximum flow problem, they seem to be at least competitive with (and possibly much 
faster than) the best known algorithms for the approximate multicommodity flow problem. 
This is a bit surprising since the best known centralized algorithms for the multicommodity 
flow problem [LM-t91] use sophisticated techniques; in contrast, the first-order and second- 
order algorithms are exceedingly simple. 

Our experimental study also leads to a number of observations and conjectures about the 
behavior of the diffusive processes used in the first- and second-order flow algorithms. We 
describe some of these as open problems for theoretical study. 

® See [MGS98, Var62] for results on choosing the “best” value of /3, and [DMN97] for choosing the best 
f3 for distributed load-balancing as a function of the graph structure. We plan to perform an experi- 
mental study of the best choices of p for flow problems on different classes of input graphs in the near 
future. 
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The remainder of this paper is organized as follows. The next section provides some defini- 
tions and notations used throughout the paper. Section 3 describes the first-order and second- 
order methods for the maximum flow problem, and presents a variety of experimental results. 
These results also give intuition to the reader about the behavior of first- and second-order 
algorithms for flow problems. Section 4 describes the algorithms and experimental results for 
the case of multicommodity flow, and they are more interesting in terms of comparative per- 
formance. A few open questions appear in Section 5. 

We have a fully functional implementation with a graphical interface for vizualizing the 
behavior of our algorithms. Some additional information about our implementation and the 
input instances used in our experiments is contained in the appendix. 

2 Preliminaries 

Throughout this paper, we assume a network (or graph) G = (V, E) with n nodes and m 
edges. We assume a model of the graph in which each edge e in the network has one capacity 
Ci(e) > 0 in one direction, and another capacity C 2 (e) > 0 in the other direction.^ Each node 
V has one queue for each incident edge. This queue can hold an unbounded amount of flow (or 
commodity), and should be considered as being located at the endpoint v of the edge. 

In the case of the maximum flow problem, we are given a source node s and a sink node t, 
and our goal is to maximize the flow between s and t. In the multicommodity flow problem with 
k commodities, we are given k source/sink pairs (sj, ti) and corresponding demands di, and 
we are interested in finding a flow that satisfies the demands of all commodities, if such a flow 
exists. In the description of the algorithms, we use z3j(e) (or A{e) in the single-commodity 
case) to denote the difference between the amounts of commodity i located in the queues at 
the two endpoints of edge e. 

3 Maximum Flow 

In this section, we focus on the maximum flow problem. This special case of the multicom- 
modity flow problem leads to particularly simple and efficient versions of the first-order and 
second-order methods. In the first subsection, we describe the first-order local-control algo- 
rithm for maximum flow. In Subsection 3.2 we explain our new second-order method, while 
Subsection 3.3 presents and discusses our experimental results. 

3.1 First-Order Distributed Maximum Flow 

We now describe the first-order algorithm for maximum flow. The algorithm proceeds in a 
number of synchronous parallel rounds (or iterations), where in each round, a small set of 
elementary operations is performed in each node and each edge of the network. In particular, 
each round consists of the following steps. 

^ Thus, each edge is equivalent to two directed edges with their own capacities ci (e) and C 2 (e) . However, 
our algorithms and implementations also extend to a graph model where the capacity of each edge is 
shared between the two directions. 
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(1) Add d units of flow to the source node, where d is chosen as the sum of the capacities of 
the outgoing edges (or some other upper bound on the value of the maximum flow). 

(2) In each node v, partition the flow that is currently in the node evenly among the 5(v) local 
queues of the 6{v) incident edges. 

(3) In each edge e, attempt to balance the amount of commodity between the two queues 

at the endpoints of the edge, by routing c(e)} units of flow across the edge, 

where A{e) is the difference in the amount of flow between the two queues, and c(e) is 
the capacity of the edge in the direction from the fuller to the emptier queue. 

(4) Remove all flow that has reached the sink from the network. 

We point out that this algorithm is a simplified version of the algorithm in [AL93] for 
the single-commodity case; the simplification results from the fact that we do not have to 
resolve any contention between different commodities. One consequence is that the algorithm 
correctly finds the maximum flow even if d is much larger than the value of that flow, that is, 
the algorithm does not rely on the existence of a feasible flow of value d. 

3.2 Second-Order Distributed Maximum Flow 

We now describe how to obtain a second-order method for distributed maximum flow. As 
already mentioned in the introduction, the second-order method computes the flow to be sent 
across an edge in the current round as a linear combination of the flow that would be sent 
according to the first-order method and the flow that was sent in the previous iteration. The 
second-order method has an additional parameter f3, with the case /3 = 1.0 being identical to 
the first-order method. More precisely. Step (3) of the above algorithm becomes; 

(3a) In each edge e, compute the desired flow across the edge as 

/ = /3 • ^ + (/3 - 1 ) • /', 

where A{e) is defined as before, and f is the (possibly negative) amount of flow that was 
sent in the direction of the imbalance, in the previous iteration. 

(3b) Obtain the amount of flow actually sent across the edge by adjusting / for the capacity of 
the edge, and for the amount of commodity available at the sending queue. 

Note that the value of / computed in Step (3a) can not only exceed the available edge 
capacity, but may also be larger than the amount of commodity available at the sending queue. 

Idealized and Realistic Versions. We distinguish two cases depending on how Step (3b) is 
handled if the amount of commodity available at the sending queue is smaller than the flow to 
be sent across that edge as calculated in Step (3a). In the idealized algorithm, we treat the flow 
accumulated at each node as just some (possibly negative) number, and we send out as much 
flow as the capacity constraint permits even if the amount of commodity stored at a sending 
queue becomes negative as a result. In the realistic algorithm, we treat the flows as physical 
flows and therefore, flows at nodes may only be non-negative. Thus, we send out the minimum 
of the flow calculated in Step (3a), the capacity of the edge, and the flow in the sending queue. 

We expect the idealistic algorithm to converge faster, and in general, have smoother con- 
vergence properties than the realistic algorithm. In order to solve the standard sequential max- 
imum flow problem, it suffices to implement the idealized case. However, if we want to solve 




Second-order Methods for Distributed Approximate Single- and Multicommodity Flow 



375 



the flow problem online in a distributed environment as flow continuously enters the source, 
the realistic algorithm must be employed. In what follows, our experimental results are for the 
realistic algorithm unless stated otherwise. 

3.3 Experimental Evaluation 

In this subsection, we present a number of experimental results on the behavior of the first- 
order and second-order methods. Due to space constraints, we cannot hope to provide a de- 
tailed study of the behavior of the methods on different classes of input graphs. Instead, we 
present a few selected results that illustrate the most interesting aspects of the behavior of the 
algorithm, and provide a brief summary of other results at the end. Some information about our 
implementation, and about the graphs used in the experiments, can be found in the appendix. 



Dependence on (3 We first look at the performance of the second-order method for different 
values of the parameter j3. Figure 3.3 shows the flow arriving at the sink in each time step, for 
several choices of (3 ranging from 1.0 to 1.95, using a 20-level mesh graph with 402 nodes and 
1180 edges. The results in Figure 3.3 show that the rate of convergence increases significantly 
as we increase (3 from 1.0 to 1.95. In particular, after 1500 iterations, the first-order method 
{(3 = 1.0) is still more than 10% away from the exact solution. In contrast, the second-order 
method with (3 = 1.95 has already converged to within 0.001%, and with a few thousand more 
iterations it reaches essentially floating point precision. 

Figure 3.3 shows the behavior of the algorithms for very small and very large values of f3. 
In particular, we see that for /3 = 0.5 the performance of the algorithm becomes even worse 
than in the first-order method, while for j3 — 2.5, the method becomes unstable, and does not 
converge to a final value. We point out that we observed a similar overall behavior on all the 
graphs that we tested, with very rapid convergence for the best values of [3 (usually, but not 
always, around 1.9), slower convergence for smaller values of (3, and instability as we increase 
(3 beyond 2.0. 

In general, the “optimal” f3, namely, the one that gives the fastest convergence is probably 
a complex function of the eigenstructure of the underlying graph. This is provably the case in 
second-order methods for the distributed load balancing problem [MGS98]. Although in many 
of the examples we show here, the optimal (3 is large (around 1.95), there are cases when a 
smaller value of (3 is preferable; see Section 4.2 for one such example. 



Convergence of Edge Flows The results in Figure 3.3 indicate a very rapid convergence of 
the amount of flow that arrives at the sink. However, this does not directly imply that all the 
flows inside the network converge to a steady state. To investigate whether this is the case, we 
define the^ow change norm as the sum, over all edges, of the absolute value of the change in 
flow between the current and the previous iteration. Thus, if this norm converges to zero, then 
the network converges to a steady flow state. 

Figure 3.3 shows the behavior of this norm for j3 equal to 1.0, 1.5, and 1.95, for the mesh 
graph considered before. As can be seen, the flow change norm converges to zero. Convergence 
is again most rapid for values of (3 around 1.9. Note that for the first 150 or so iterations, the 
flow change norm for (3 = 1.95 is actually larger than that of the other curves, indicating a 
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Fig. 1. Convergence of the 
second-order method with (3 
set to 1.0 (lower curve), 1.2, 
1.4, 1.6, 1.8, and 1.95 (up- 
per curve). 



Fig. 2. Behavior for /? = 2.5 
(upper curve) and f3 = 0.5 
(lower curve). 



Fig. 3. Convergence of the flow 
change norm for f3 equal to 1.0, 1.5, 
and 1.95. 



faster initial response to the injected flow. A similar rapid convergence behavior of the flows 
was observed in all our experiments. 

The convergence of the flows is significant because it allows us to directly use the stabilized 
flow in the network as an approximate solution for the standard offline maximum flow problem, 
instead of computing the flow by averaging out the history of the edge flows, as suggested in 
[AL93]. Averaging the history implies the algorithm must be run for a much longer period to 
obtain a good approximation since the approximation ratio is then given by the ratio of the area 
under the curve and the area under the horizontal line at the height of the maximum flow. 



Idealized Second-Order Method Recall that in Step (3b) of the second-order method, we 
may have to adjust the amount of flow sent across an edge in order to avoid getting a negative 
amount of commodity in the sending queue. In the following, we investigate how the behavior 
of the algorithm changes if we allow negative amounts of commodity at the nodes, that is, we 
consider the idealized second-order method described in Subsection 3.2, which does not adjust 
the flow for the amount of available commodity. 

Figure 3.3 shows the convergence of the idealized and realistic methods for different values 
of /3, for the mesh graph considered before. Note that for (3 — 1.95, the flow converges to more 
than 15 digits of accuracy in less than 1000 iterations. If we increase (3 further towards 2.0 
we notice that the flow starts oscillating more extremely, and for values beyond 2.0 the method 
does not converge anymore. Figure 3.3 shows the behavior of the idealized method for the case 
of (3 = 2.0. (For the realistic method, this effect appears to be slightly less abrupt in that the 
method becomes instable more slowly as we increase (3 beyond 2.0.) 

Note that whether allowing negative amounts of commodity at the nodes is appropriate or 
not depends on the particular application. If the goal is just to find a solution to the maximum 
flow problem, and the actual routing of the commodities is done in a separate phase afterwards, 
then the idealized version is fine. On the other hand, a major advantage of the distributed 
methods is that they overlap the process of finding the flow paths with that of routing the 
commodities, in which case the idealized version is not appropriate. 
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Fig. 4. Convergence of the idealized (solid lines) 
and realistic (dashed lines) second-order method 
with p equal to 1.0 (lower curve), 1.2, 1.4, 1.6, 
1 .8, and 1 .95 (upper curve). Note that the two low- 
est dashed curves are hidden by the corresponding 
solid curves. 




Fig. 5. Behavior of the idealized second-order 
method with (} — 2.0. 



4 Multicommodity Flow 

In this section, we consider the case of multiple commodities. We first outline the first-order 
algorithm, which is a slightly simplified version® of the algorithm proposed by Awerbuch and 
Leighton [AL93], and describe the modifications needed for the second-order method. We then 
present our experimental results. 

4.1 Description of the Algorithms 

As in the single-commodity case, the algorithm proceeds in parallel rounds (or iterations). In 
our first-order implementation, the following operations are performed in each round. 

(1) Add di units of commodity i to source node Sj, for 0 < i < k. 

(2) For each node v and each commodity i, partition the amount of commodity i that is cur- 
rently in node v evenly among the 6{v) local queues of the 6{v) incident edges. 

(3) In each edge e, attempt to balance the amount of each commodity between the two queues 
at the endpoints of the edge, subject to the capacity constraint of the edge. Several com- 
modities may be contending for the capacity of the edge; this contention is resolved in the 
following way: 

Let Ai (e) be the difference in the amount of commodity i between the two queues at the 
endpoints of edge e. The flow /j for commodity i is computed from the di, Z\*(e), and the 
edge capacity by using the algorithm described in Section 2.4.1 of [AL93], the details of 
which are omitted here. 

(4) Remove from the network any commodity that has reached the appropriate sink. 

® In particular, we get rid of the e terms needed for the analysis in [AL93]. 
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The second-order method can again be obtained with only a minor change in the algorithm. 
In particular, we compute 

z3'(e) = /3 • zi,(e) + 2.0 •(/?- !)•/', 

where /' is the amount of commodity i sent across the edge in the previous iteration. In the 
non-idealized version of the algorithm, where we do not allow negative amounts of commodity, 
we also have to adjust (e) if is larger than the amount of commodity i available in the 
sending queue; this leads to the idealized and realistic case as with the maximum flow problem. 
We then apply the same algorithm as in the first-order method to resolve contention between 
the different commodities, but use the Zi' (e) in place of the Z\j(e). 

4.2 Experimental Results 

We now present experimental results on the performance of the second-order method. Due to 
space constraints, we can only give a few selected results. 

Sample Performance Results. Figure 4.2 shows the behavior of the idealized second-order 
method with (3 = 1.95 on a 5 x 5 x 20 RMF graph with 5 sources and sinks selected at random 
from the nodes in the first and last level of the graph, respectively. The demands for the flows 
were chosen such that the flow is feasible, but within about 2% of the upper bound given 
by the maximum concurrent flow. Figure 4.2 shows the 5 flows converging to their respective 
demands. After about 4500 iterations, all flows have converged to within 16 digits of precision. 
In contrast, if we use the first-order method on this problem, then we need more than 10000 
iterations to converge to within 10% of the demands. 




Fig. 6. Convergence of the idealized second-order 
method with [3 = 1.95 on an RMF graph with five 
commodities. 



Fig. 7. Convergence of the realistic and idealized 
second-order methods with different values of (3, 
on a 500 node RMF graph with 25 commodities. 
For each case, we plot the maximum and minimum 
flow/demand ratios over all commodities. 



Figure 4.2 shows the behavior of the second-order method for a 5 x 5 x 20 RMF graph 
with 25 commodities routed between the first and the last layer of the graph, with demands 






Second-order Methods for Distributed Approximate Single- and Multicommodity Flow 



379 



chosen at random and then scaled such that they are within 1% of the maximum concurrent 
flow. The values measured on the y-axis are the minimum and maximum fractions z, over all 
commodities, such that times the demand of a commodity arrives at its sink in a given step. 
Figure 4.2 shows the convergence behavior for the realistic second-order method with /3 = 1.0, 
1.5, and 1.95, and for the idealized second-order method with /3 = 1.95 and 1.99. The figure 
shows a clear advantage of the second-order over the first-order method, and of the idealized 
over the realistic method. 

Dependence on j3. The behavior of the second-order multicommodity flow algorithms for 
varying values of /3 turned out to be similar to that of the second-order maximum flow al- 
gorithm. While for most of our input graphs the optimal value of /3 was between 1.95 and 
1.99, there are other classes of graphs where the optimal value is significantly smaller; see 
Figures 4.2 and 4.2 for an example. 




Iterations 

Fig. 8. Behavior of the idealized second-order 
method on a 5 node clique graph with 5 commodi- 
ties and (3 = 1.4. 




Fig. 9. Behavior of the idealized second-order 
method on a 5 node clique graph with 5 commodi- 
ties and p = 1.98. 



Running Times. In Table 1, we provide some very preliminary timing results. All timings 
were performed on a Sun Ultra 30 workstation with 300 Mhz UltraSPARCII processor and 
256 MB of RAM, and the codes were compiled with the -O option using the vendor-supplied 
C compiler. 

As input graph, we used a 5 x 5 x 20 RMF graph, with 25, 50, and 100 commodities. 
All demands had the same value, while the capacities of the forward edges in the RMF graph 
were chosen at random. The sources and sinks were chosen from the first and last panels, 
respectively, of the graph. ^ 

We give running times for four different methods: (1) the basic first-order method, as de- 
scribed by Awerbuch and Leighton [AL93], (2) the realistic second-order method with f} = 

® Thus, since the number of nodes in the first panel is 25, the number of “commodity groups” (see 
[LSS93]) in the implementation of Leong, Shor, and Stein [LSS93] is at most 25, independent of the 
number of commodities. 
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1.99, (3) the idealistic second-order method with (5 — 1.97, and (4) the Maximum Concurrent 
Flow code of Leong, Shor, and Stein [LSS93], referred to as LSS. 



Algorithm 


25 commodities 


50 commodities 


100 commodities 


Leong-Shor-Stein (LSS) 


519.77 


456.10 


501.72 


First-order (Awerbuch-Leighton) 


642.99 


1233.32 


2836.62 


Realistic second-order, [5 = 1.99 


149.01 


304.64 


645.16 


Idealistic second-order, /3 = 1.97 


9.54 


27.70 


70.41 



Table 1. Running times (in seconds) of the different algorithms on a 500 node RMF graph. For LSS, we 
chose e = 0.05, while for the other codes, we terminated the runs after every commodity was within a 
0.01 factor (first-order) or 0.001 factor (second-order) of its demand. 



When looking at these numbers, the reader should keep the following points in mind; 

( 1 ) The code of Leong, Shor, and Stein [LSS93] solves the more general problem of maximiz- 
ing the ratio of feasible flow, while our code only finds a feasible flow. However, we are 
not aware of any code for feasible flow that outperforms LSS. Following the suggestion in 
[LSS93], all our runs are performed with demands very close to the maximum feasible, by 
scaling the demands using the maximum edge congestion returned by LSS. 

(2) The results for LSS are most likely not optimal, as we were unsure about the best setting 
of parameters for the code. Given the results reported in [LSS93] and the increases in CPU 
speed over the last few years, we would have expected slightly better numbers. 

(3) We have not yet implemented a good termination condition for our code. Instead, we chose 
to measure the time until all flows at the sinks have converged to within a factor of at most 
0.001 (second-order method) or 0.01 (first-order method) of the demands. 

(4) We limit the reported numbers to RMF graphs due to differences in the graph formats used 
in LSS and in our code, which did not allow a direct comparison on other types of graphs. 

We point out that the behavior of the LSS algorithm is fairly complex, while the perfor- 
mance of our second-order methods is dependent on the precise choice of /3. Thus, one should 
be careful when trying to infer general performance trends from the few numbers provided 
above. However, our experiments with other graphs also showed a similar behavior. Thus, we 
believe that our implementation is at least competitive with the best previous codes, and may 
in fact significantly outperform them. We plan to perform a more thorough study in the future. 
We also see significant room for further improvements in the running times of our codes. 

Sensitivity Analysis. An attractive feature of local algorithms is that they are, in general, ro- 
bust. That is, they are expected to scale gracefully when edges appear or disappear, or traffic 
patterns change [AL93]. We will not try to formalize this intuition here. In Figure 4.2, we 
present an illustrative example of the behavior of local flow algorithms under dynamic sit- 
uations, which shows how the resulting flows adapt quickly as we change the demands of 
commodities. 
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Iterations 



Fig. 10. Sensitivity of the algorithm to changes in demands, for the idealized method with f3 — 1.98 on 
a 500 node RMF graph with 5 commodities. We show the amounts of flow ariving at the sinks as we 
repeatedly change the demands, and thus the amounts of commodity injected into the network in each 
step. 



5 Concluding Remarks 

In this paper, we have proposed second-order methods for distributed, approximate maximum 
flow and multicommodity flow based on the first-order algorithms recently proposed by Awer- 
buch and Leighton [AL93, AL94]. We have presented experimental results that illustrate sev- 
eral interesting aspects of the behavior of these algorithms, and that provide strong evidence 
that the second-order methods significantly outperform their first-order counterparts. 

The main open problem raised by our results is to give a formal analysis of the performance 
of the second-order methods for multicommodity flow, or to at least show a separation between 
first-order and second-order methods. We believe that this is a very challenging technical prob- 
lem. Our experimental results also raise, and leave open, a number of other intriguing questions 
concerning the behavior of such distributed flow algorithms, and the diffusive processes un- 
derlying them. We list a few below. 

Question 1. It would be very interesting to show that not only the amount of flow reaching 
the sinks, but in fact the entire “flow pattern” in the network converges to a stable state. 
This was the case in all our experiments. If true, this will simplify the process of stopping the 
iteration in a distributed manner when the flows have converged; furthermore, it may improve 
the analytical bounds on the performance of the algorithm, since we do not have to average the 
flows over several steps as suggested in [AL93]. 

Question 2. For the case of the maximum flow problem, it would be interesting to show bounds 
that are tighter than those implied by the analysis for multicommodity flow in [AL93]. In 

As far as we know, this question is still open even in the first-order maximum flow case. 
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particular, it appears from our experiments that the convergence behavior of the maximum 
flow algorithms may be significantly better than 1/e. 

Question 3. Suppose the flow injected into the sources at each iteration consists of a collection 
of packets. Can we analyze or bound the delays of the packets, given an appropriate scheduling 
principle for packets at each node (such as first-in-first-out), if only for the first-order methods? 
This would correspond to providing certain quality-of-service guarantees to the sessions in 
communication networks. Such analysis was recently done for load balancing [MR98] and 
packet routing [AK-h 98] under adversarial models of traffic injection, but assuming unit edge 
capacities. 

Question 4. As mentioned earlier, random walks can be modeled as a matrix iteration which 
is identical to the behavior of first-order algorithms for distributed load balancing [MGS98]. 
Can we design random walks that correspond to second-order algorithms? This may lead to 
improved bounds for mixing times of random walks. Some progress has been made recently 
for special graphs [S98]. Another question that arises is whether random walks can be set up 
to yield the first/second-order behavior in the presence of edge capacities. □ 

We are working on several extensions of our experimental results. In particular, we are 
working on an implementation of the improved first-order algorithm presented in [AL94], and 
on dynamic acceleration schemes for the second-order method such as those using Chebyshev 
polynomials with a j3 that varies from iteration to iteration. We are also in the process of 
carrying out a thorough comparison of our distributed implementations to that of the existing 
sequential multicommodity codes (see [LSS93] and the references therein). 
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7 Appendix: Experimental Setup 

Implementation Details. All algorithms were implemented in C. A graphical frontend based on Tcl/Tk 
was used to run experiments and display the results. All input graphs were supplied in the DIMACS graph 
format, with some extensions to specify multiple commodities and changes in the demands over time. 

Most of the execution time is spent in Steps (2) and (3) of the algorithm, which were implemented 
together in one single loop over the edges. Thus, the partitioning of the commodities between the queues 
was done during the edge balancing process, by applying an appropriate scaling factor to the flow stored 
in a node. This resulted in a very efficient implementation for the maximum flow case. 
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For the multicommodity flow case, the running time of Step (3) is dominated by the algorithm for 
resolving contention between different commodities in Section 2.4.1 of [AL93], which requires sorting 
the commodities in each edge by the values of Ai{e)ldj. While these values vary between iterations, 
the changes become increasingly smaller as the method converges. We exploited this property by using 
insertion sort and inserting the commodities in the sorted order of the previous iteration. 

Input Graphs. 

In our experiments described in this paper, we used three different classes of input graphs: mesh 
graphs, random leveled graphs, and RMF graphs. The first two types of graphs were generated using 
the GENGRAPH program of Anderson et al. from the University of Washington. The RMF graphs were 
generated with the GENRMF program of Tamas Badics. Both programs are available from the DIMACS 
website. Examples of these graphs are shown in Figures 7, 7, and 7. 




Fig. II. Mesh graph with 3 levels and 14 nodes. 
All edges have randomly chosen capacity, except 
for edges connecting to the source or sink, which 
have capacity large enough such that they never 
constitute a bottleneck. 



Fig. 12. Random leveled graph with 3 levels and 
14 nodes. All edges have randomly chosen capac- 
ity, except for edges connecting to the source or 
sink, which have capacity large enough such that 
they never constitute a bottleneck. 




Fig. 13. A3 X 3 X 2 RMF graph. All edges between different layers have randomly chosen capacity, 
while edges inside a layer have capacity large enough such that they never constitute a bottleneck. 
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